0% found this document useful (0 votes)
13 views

BA Material

Uploaded by

Nalini Bangaram
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

BA Material

Uploaded by

Nalini Bangaram
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

Unit1:

Business analytics: Overview of Business analytics, Scope of Business analytics, Business


Analytics Process, Relationship of Business Analytics Process and organisation, competitive
advantages of Business Analytics.
Statistical Tools: Statistical Notation, Descriptive Statistical methods, Review of probability
distribution and data modelling, sampling and estimation methods overview.
Business analytics: Overview of Business analytics,
**Business Analytics Overview:**

Business Analytics (BA) is the practice of using data analysis and statistical methods to derive
insights and make informed business decisions.
It involves the use of various tools, techniques, and technologies to collect, process, and analyze
data, with the ultimate goal of providing valuable information to support organizational decision-
making.

Business Analytics encompasses a wide range of activities, including:

1. **Descriptive Analytics:** This involves the use of historical data to understand and describe
what has happened in the past. It includes techniques such as data visualization, reporting, and
summarization to provide a clear picture of historical trends and patterns.

2. **Predictive Analytics:** Predictive analytics involves the use of statistical algorithms and
machine learning techniques to identify patterns and make predictions about future events. This can
help organizations anticipate trends, forecast demand, and make proactive decisions.

3. **Prescriptive Analytics:** This form of analytics goes beyond predicting future outcomes and
provides recommendations on what actions to take to optimize a given situation. It involves the use
of optimization and simulation techniques to determine the best course of action.

4. **Diagnostic Analytics:** Diagnostic analytics involves examining data to understand why a


certain event or outcome occurred. It helps in identifying the root causes of problems or issues and
is crucial for making data-driven improvements.
5. **Big Data Analytics:** With the advent of big data, organizations are now dealing with vast
amounts of data from various sources. Big Data Analytics involves the use of advanced analytics
techniques to process and analyze large datasets that traditional analytics tools might struggle to
handle.

Business Analytics is applied across various functional areas within an organization, including
finance, marketing, operations, human resources, and more. It plays a crucial role in gaining a
competitive advantage, improving operational efficiency, identifying new business opportunities,
and mitigating risks.
Key components of Business Analytics include data collection, data cleaning and preprocessing,
exploratory data analysis, modeling, and interpretation of results. Businesses often use specialized
tools and platforms, such as data visualization tools, statistical software, and business intelligence
platforms, to carry out these tasks.
Scope of Business analytics:
The scope of Business Analytics is expansive and continues to evolve as technology advances and
organizations recognize the value of data-driven decision-making. Here are some key aspects that
define the scope of Business Analytics:

1. Industry Applications: Business Analytics is applicable across various industries, including


finance, healthcare, retail, manufacturing, telecommunications, and more. Virtually any industry
that generates and collects data can benefit from the insights derived through analytics.
2. **Functional Areas:** It is used in different functional areas within an organization, such as
marketing, finance, operations, human resources, supply chain management, and customer service.
Each of these areas can leverage analytics to optimize processes, enhance performance, and achieve
strategic objectives.
3. **Descriptive, Predictive, and Prescriptive Analytics:** The scope of Business Analytics
encompasses all three main types of analytics. Descriptive analytics helps organizations understand
historical data, predictive analytics enables forecasting and trend analysis, and prescriptive analytics
provides actionable recommendations to improve decision-making.
4. **Data Variety:** Business Analytics deals with diverse types of data, including structured data
(such as databases and spreadsheets), unstructured data (such as text and social media posts), and
semi-structured data. The ability to analyze and derive insights from different data sources is a
crucial aspect of its scope.
5. **Big Data Analytics:** As organizations deal with massive volumes of data, the scope of
Business Analytics has expanded to include Big Data Analytics. This involves processing and
analyzing large datasets using advanced technologies and tools, such as Hadoop and Spark.
6. **Data Visualization:** The visualization of data is a significant part of Business Analytics.
Tools for data visualization help in presenting complex information in a visually understandable
format, making it easier for decision-makers to comprehend and act upon insights.
7. **Machine Learning and Artificial Intelligence:** The integration of machine learning (ML)
and artificial intelligence (AI) technologies has expanded the scope of Business Analytics. These
technologies enable more advanced predictive modeling, anomaly detection, and automation of
certain analytical processes.
8. **Decision Support Systems:** Business Analytics contributes to the development and
implementation of decision support systems (DSS). These systems assist decision-makers by
providing relevant information and analytics to support strategic and tactical decisions.
9. **Continuous Improvement:** Business Analytics is not a one-time activity; it involves a
continuous improvement cycle. Organizations use analytics to monitor performance, identify areas
for improvement, and adjust strategies accordingly.
10. **Risk Management:** Business Analytics plays a crucial role in identifying and mitigating
risks. Through predictive analytics, organizations can assess potential risks and take proactive
measures to minimize their impact.
Business Analytics Process
The complete business analytic process involves the three major component steps applied
sequentially to a source of data (see Figure 1.1). The outcome of the business analytic process must
relate to business and seek to improve business performance in some way.

Figure 1.1 Business analytic process


The logic of the BA process in Figure 1.1 is initially based on a question: What valuable or
problem-solving information is locked up in the sources of data that an organization has available?
At each of the three steps that make up the BA process, additional questions need to be answered,
as shown in Figure 1.1. Answering all these questions requires mining the information out of the
data via the three steps of analysis that comprise the BA process. The analogy of digging in a mine
is appropriate for the BA process because finding new, unique, and valuable information that can
lead to a successful strategy is just as good as finding gold in a mine. SAS, a major analytic
corporation (www.sas.com), actually has a step in its BA process, Query Drilldown, which refers to
the mining effort of questioning and finding answers to pull up useful information in the BA
analysis. Many firms routinely undertake BA to solve specific problems, while other firms
undertake BA to explore and discover new knowledge to guide organizational planning and
decision-making to improve business performance.
The size of some data sources can be unmanageable, overly complex, and generally confusing.
Sorting out data and trying to make sense of its informational value requires the application of
descriptive analytics as a first step in the BA process. One might begin simply by sorting the data
into groups using the four possible classifications presented in Table 1.4. Also, incorporating some
of the data into spreadsheets like Excel and preparing cross tabulations and contingency tables are
means of restricting the data into a more manageable data structure. Simple measures of central
tendency and dispersion might be computed to try to capture possible opportunities for business
improvement. Other descriptive analytic summarization methods, including charting, plotting, and
graphing, can help decision makers visualize the data to better understand content opportunities.
Table 1.4 Types of Data Measurement Classification Scales

Type of Data Description


Measurement
Scale

Categorical Data Data that is grouped by one or more characteristics. Categorical data usually
involves cardinal numbers counted or expressed as percentages. Example 1:
Product markets that can be characterized by categories of “high-end”
products or “low-income” products, based on dollar sales. It is common to use
this term to apply to data sets that contain items identified by categories as
well as observations summarized in cross-tabulations or contingency tables.

Ordinal Data Data that is ranked or ordered to show relational preference. Example 1:
Football team rankings not based on points scored but on wins. Example 2:
Ranking of business firms based on product quality.

Interval Data Data that is arranged along a scale where each value is equally distant from
others. It is ordinal data. Example 1: A temperature gauge. Example 2: A
survey instrument using a Likert scale (that is, 1, 2, 3, 4, 5, 6, 7), where 1 to 2
is perceived as equidistant to the interval from 2 to 3, and so on. Note: In
ordinal data, the ranking of firms might vary greatly from first place to
second, but in interval data, they would have to be relationally proportional.

Ratio Data Data expressed as a ratio on a continuous scale. Example 1: The ratio of firms
with green manufacturing programs is twice that of firms without such a
program.

From Step 1 in the Descriptive Analytic analysis (see Figure 1.1), some patterns or variables of
business behavior should be identified representing targets of business opportunities and possible
(but not yet defined) future trend behavior. Additional effort (more mining) might be required, such
as the generation of detailed statistical reports narrowly focused on the data related to targets of
business opportunities to explain what is taking place in the data (what happened in the past). This
is like a statistical search for predictive variables in data that may lead to patterns of behavior a firm
might take advantage of if the patterns of behavior occur in the future. For example, a firm might
find in its general sales information that during economic downtimes, certain products are sold to
customers of a particular income level if certain advertising is undertaken. The sales, customers,
and advertising variables may be in the form of any of the measurable scales for data in Table 1.4,
but they have to meet the three conditions of BA previously mentioned: clear relevancy to business,
an implementable resulting insight, and performance and value measurement capabilities.
To determine whether observed trends and behavior found in the relationships of the descriptive
analysis of Step 1 actually exist or hold true and can be used to forecast or predict the future, more
advanced analysis is undertaken in Step 2, Predictive Analytic analysis, of the BA process. There
are many methods that can be used in this step of the BA process. A commonly used methodology
is multiple regression. (See Appendix A, “Statistical Tools,” and Appendix E, “Forecasting,” for a
discussion on multiple regression and ANOVA testing.) This methodology is ideal for establishing
whether a statistical relationship exists between the predictive variables found in the descriptive
analysis. The relationship might be to show that a dependent variable is predictively associated with
business value or performance of some kind. For example, a firm might want to determine which of
several promotion efforts (independent variables measured and represented in the model by dollars
in TV ads, radio ads, personal selling, and/or magazine ads) is most efficient in generating customer
sale dollars (the dependent variable and a measure of business performance). Care would have to be
taken to ensure the multiple regression model was used in a valid and reliable way, which is why
ANOVA and other statistical confirmatory analyses are used to support the model development.
Exploring a database using advanced statistical procedures to verify and confirm the best predictive
variables is an important part of this step in the BA process. This answers the questions of what is
currently happening and why it happened between the variables in the model.
A single or multiple regression model can often forecast a trend line into the future. When
regression is not practical, other forecasting methods (exponential smoothing, smoothing averages)
can be applied as predictive analytics to develop needed forecasts of business trends. (See
Appendix E.) The identification of future trends is the main output of Step 2 and the predictive
analytics used to find them. This helps answer the question of what will happen.
If a firm knows where the future lies by forecasting trends as they would in Step 2 of the BA
process, it can then take advantage of any possible opportunities predicted in that future state. In
Step 3, Prescriptive Analytics analysis, operations research methodologies can be used to optimally
allocate a firm’s limited resources to take best advantage of the opportunities it found in the
predicted future trends. Limits on human, technology, and financial resources prevent any firm
from going after all opportunities they may have available at any one time. Using prescriptive
analytics allows the firm to allocate limited resources to optimally achieve objectives as fully as
possible. For example, linear programming (a constrained optimization methodology) has been
used to maximize the profit in the design of supply chains (Paksoy et al., 2013). (Note: Linear
programming and other optimization methods are presented in Appendixes B, “Linear
Programming,” C, “Duality and Sensitivity Analysis in Linear Programming,” and D, “Integer
Programming.”) This third step in the BA process answers the question of how best to allocate and
manage decision-making in the future.
Relationship of Business Analytics Process and organisation,
The BA process can solve problems and identify opportunities to improve business performance. In
the process, organizations may also determine strategies to guide operations and help achieve
competitive advantages. Typically, solving problems and identifying strategic opportunities to
follow are organization decision-making tasks. The latter, identifying opportunities, can be viewed
as a problem of strategy choice requiring a solution. It should come as no surprise that the BA
process described in Section 1.2 closely parallels classic organization decision-making processes.
As depicted in Figure 1.2, the business analytic process has an inherent relationship to the steps in
typical organization decision-making processes.

Figure 1.2 Comparison of business analytics and organization decision-making processes


The organization decision-making process (ODMP) developed by Elbing (1970) and presented
in Figure 1.2 is focused on decision making to solve problems but could also be applied to finding
opportunities in data and deciding what is the best course of action to take advantage of them. The
five-step ODMP begins with the perception of disequilibrium, or the awareness that a problem
exists that needs a decision. Similarly, in the BA process, the first step is to recognize that databases
may contain information that could both solve problems and find opportunities to improve business
performance. Then in Step 2 of the ODMP, an exploration of the problem to determine its size,
impact, and other factors is undertaken to diagnose what the problem is. Likewise, the BA
descriptive analytic analysis explores factors that might prove useful in solving problems and
offering opportunities. The ODMP problem statement step is similarly structured to the BA
predictive analysis to find strategies, paths, or trends that clearly define a problem or opportunity
for an organization to solve problems. Finally, the ODMP’s last steps of strategy selection and
implementation involve the same kinds of tasks that the BA process requires in the final
prescriptive step (make an optimal selection of resource allocations that can be implemented for the
betterment of the organization).
The decision-making foundation that has served ODMP for many decades parallels the BA process.
The same logic serves both processes and supports organization decision-making skills and
capacities.
Competitive advantages of Business Analytics.
Business Analytics provides organizations with various competitive advantages by leveraging data
to gain insights, make informed decisions, and drive strategic initiatives. Here are some key
competitive advantages associated with the effective use of Business Analytics:
1. **Informed Decision-Making:**
- Business Analytics enables organizations to make well-informed and data-driven decisions. By
analyzing historical data and predicting future trends, decision-makers can have a clearer
understanding of the potential outcomes of different choices, reducing the reliance on intuition or
gut feeling.
2. **Operational Efficiency:**
- Analytics helps organizations optimize their operations by identifying inefficiencies and areas
for improvement. This can lead to cost savings, streamlined processes, and enhanced productivity,
contributing to a more efficient and agile business environment.
3. **Better Customer Insights:**
- Business Analytics allows organizations to gain a deeper understanding of their customers. By
analyzing customer behavior, preferences, and feedback, businesses can tailor their products,
services, and marketing strategies to better meet customer needs, leading to increased customer
satisfaction and loyalty.
4. **Market and Competitive Analysis:**
- Analytics enables organizations to conduct thorough market and competitive analysis. By
examining market trends, competitor performance, and customer sentiment, businesses can identify
opportunities for growth, assess market risks, and stay ahead of industry developments.
5. **Strategic Planning and Forecasting:**
- Through predictive analytics, organizations can forecast future trends and market conditions.
This forecasting capability is invaluable for strategic planning, allowing businesses to anticipate
changes, adapt to market shifts, and proactively position themselves for success.
6. **Risk Management:**
- Business Analytics helps identify and mitigate risks by analyzing historical data and predicting
potential future risks. This proactive approach to risk management allows organizations to develop
strategies to mitigate negative impacts and seize opportunities even in uncertain environments.
7. **Personalized Marketing and Customer Engagement:**
- Analytics enables personalized marketing strategies by tailoring messages and offerings based
on individual customer preferences and behaviors. This targeted approach can improve customer
engagement, increase conversion rates, and boost the effectiveness of marketing campaigns.
8. **Supply Chain Optimization:**
- Businesses can optimize their supply chains by using analytics to improve inventory
management, demand forecasting, and supplier relationships. This optimization can lead to cost
reductions, faster delivery times, and increased overall supply chain efficiency.
9. **Enhanced Product and Service Innovation:**
- By analyzing customer feedback, market trends, and competitive landscapes, organizations can
identify opportunities for innovation. This insight-driven innovation can lead to the development of
new and improved products and services, giving businesses a competitive edge in the market.
10. **Adaptation to Changing Business Conditions:**
- Analytics provides the agility for organizations to adapt to changing business conditions.
Whether it's responding to shifts in consumer behavior, economic changes, or industry disruptions,
businesses equipped with analytics can adjust their strategies and operations more effectively.
11. **Continuous Improvement Culture:**
- Business Analytics fosters a culture of continuous improvement. By regularly analyzing
performance metrics and feedback, organizations can identify areas for enhancement, iterate on
strategies, and stay responsive to evolving market dynamics.
In summary, Business Analytics offers a range of competitive advantages, empowering
organizations to make informed decisions, optimize operations, and stay ahead in an increasingly
dynamic and competitive business landscape. The ability to harness the power of data for strategic
purposes can be a key differentiator in achieving long-term success.
Statistical Tools: Statistical Notation,
Statistical notation is a standardized way of representing statistical concepts and formulas using
symbols and letters. It is commonly used in statistical analysis to express mathematical
relationships and formulas concisely. Here are some commonly used statistical notations:
1. **Population Parameters:**
- **Population Mean:** μ (mu)
- **Population Standard Deviation:** σ (sigma)
- **Population Variance:** σ² (sigma squared)
2. **Sample Statistics:**
- **Sample Mean:** x̄ (x-bar)
- **Sample Standard Deviation:** s
- **Sample Variance:** s²
3. **Random Variables:**
- **Random Variable:** X
- **Probability of X:** P(X)
- **Expected Value (Mean) of X:** E(X) or μ (mu) for population mean
- **Variance of X:** Var(X) or σ² (sigma squared) for population variance
- **Standard Deviation of X:** SD(X) or σ (sigma) for population standard deviation
4. **Probability Notation:**
- **Probability of an Event A:** P(A)
- **Complement of A:** P(A') or P(not A)
- **Intersection of Events A and B:** P(A ∩ B)
- **Union of Events A and B:** P(A ∪ B)
5. **Statistical Distributions:**
- **Normal Distribution:** N(μ, σ²) for a normal distribution with mean μ and variance σ²
- **Binomial Distribution:** B(n, p) for a binomial distribution with parameters n (number of
trials) and p (probability of success)
- **Poisson Distribution:** P(λ) for a Poisson distribution with parameter λ (average rate of
events)
6. **Hypothesis Testing:**
- **Null Hypothesis:** H₀
- **Alternative Hypothesis:** H₁
- **Significance Level (Alpha):** α
- **Test Statistic:** Z, t, χ² (depending on the test)
7. **Regression Analysis:**
- **Regression Coefficients:** β₀ (intercept), β₁, β₂, ... (slope coefficients)
- **Predicted (Fitted) Values:** ŷ
- **Residuals:** e (difference between observed and predicted values)
8. **Summation Notation:**
- **Summation (Sigma) Notation:** Σ (uppercase sigma)
- **Summation of a Sequence:** Σ Xi (sum of all X values in a sequence)
9. **Correlation Coefficient:**
- **Pearson Correlation Coefficient:** r
10. **Confidence Intervals:**
- **Confidence Interval for Mean:** \(\bar{X} \pm Z \dfrac{s}{\sqrt{n}}\) where Z is the Z-
score for a given confidence level.

These notations are widely used in statistical literature, research papers, and educational materials
to represent statistical concepts and calculations concisely and consistently. Understanding and
using statistical notation is crucial for effective communication in the field of statistics.
Descriptive Statistical methods,
Descriptive statistics are methods used to summarize and describe the main features of a dataset.
These methods provide a way to organize and simplify large amounts of data, making it more
understandable. Here are some common descriptive statistical methods:
1. **Measures of Central Tendency:**
- **Mean (Average):** The sum of all values divided by the number of values in the dataset.

- **Median:** The middle value in a dataset when it is ordered. It is less sensitive to extreme
values than the mean.

- **Mode:** The value that occurs most frequently in a dataset.


2. **Measures of Dispersion (Variability):**
- **Range:** The difference between the maximum and minimum values in a dataset.

- **Variance:** The average of the squared differences from the mean.

- **Standard Deviation:** The square root of the variance, providing a measure of the average
distance of data points from the mean.

3. **Frequency Distributions:**
- **Frequency:** The number of times a particular value occurs in a dataset.
- **Relative Frequency:** The proportion of times a value occurs relative to the total number of
observations.
- **Histograms:** A graphical representation of the distribution of a dataset, showing the
frequency of different values.
4. **Percentiles and Quartiles:**
- **Percentiles:** Values below which a given percentage of data falls. The 50th percentile is the
median.
- **Quartiles:** Values that divide a dataset into four equal parts. The first quartile (Q1) is the
25th percentile, the second quartile (Q2) is the median, and the third quartile (Q3) is the 75th
percentile.
5. **Skewness and Kurtosis:**
- **Skewness:** Measures the asymmetry of a distribution. A skewness of 0 indicates a perfectly
symmetrical distribution.
- **Kurtosis:** Measures the "tailedness" of a distribution. Positive kurtosis indicates heavier
tails, and negative kurtosis indicates lighter tails compared to a normal distribution.
6. **Correlation Coefficient:**
- **Pearson Correlation Coefficient (r):** Measures the strength and direction of a linear
relationship between two variables. It ranges from -1 (perfect negative correlation) to 1 (perfect
positive correlation).
7. **Central Limit Theorem:**
- **Central Limit Theorem (CLT):** States that, for a large enough sample size, the distribution
of the sample mean will be approximately normally distributed, regardless of the distribution of the
original population.

These descriptive statistical methods provide valuable insights into the characteristics of a dataset,
helping researchers and analysts summarize and interpret data effectively. They are essential for
understanding the basic properties of data before moving on to more advanced statistical analyses.
Review of probability distribution and data modelling
**Review of Probability Distribution:**

**Definition:**
A probability distribution describes how the values of a random variable are spread or distributed
across different outcomes. It provides the likelihood of each possible outcome in a sample space.
**Key Concepts:**
1. **Discrete Probability Distribution:**
- Describes the probabilities associated with discrete random variables. The probabilities are
assigned to individual values.
2. **Continuous Probability Distribution:**
- Describes the probabilities associated with continuous random variables. Instead of individual
values, probabilities are assigned to ranges of values.
3. **Probability Mass Function (PMF):**
- For discrete random variables, the probability mass function gives the probability of each
possible value. It is often denoted as P(X = x).
4. **Probability Density Function (PDF):**
- For continuous random variables, the probability density function gives the probability density
at a given point. The probability of an event occurring within a given range is found by integrating
the PDF over that range.
5. **Cumulative Distribution Function (CDF):**
- The cumulative distribution function gives the probability that a random variable takes a value
less than or equal to a given value. It is denoted as F(x) for both discrete and continuous random
variables.
6. **Expected Value (Mean):**
- Represents the average of a random variable's possible values, weighted by their probabilities.

For a discrete variable X, it is calculated as , and for a continuous

variable, it is
7. **Variance and Standard Deviation:**
- Variance measures the spread of values around the mean. For a discrete variable X,

and for a continuous variable, it is

. Standard deviation is the square root of the variance.

**Review of Data Modeling:**


**Definition:**
Data modeling is the process of creating a representation of real-world entities and their
relationships to understand, analyze, and communicate information. In statistics, modeling involves
creating mathematical or statistical representations of data.
**Key Concepts:**
1. **Linear Regression:**
- A statistical method for modeling the relationship between a dependent variable and one or more
independent variables. The model assumes a linear relationship and aims to find the best-fitting
line.
2. **Logistic Regression:**
- Used for modeling the probability of a binary outcome. It models the relationship between a
dependent binary variable and one or more independent variables using the logistic function.
3. **Time Series Analysis:**
- Modeling data points collected over time to understand patterns, trends, and seasonality.
Methods include autoregressive integrated moving average (ARIMA) and seasonal decomposition
of time series (STL).
4. **Decision Trees:**
- A tree-like model representing decisions and their possible consequences, including chance
event outcomes, resource costs, and utility. It is commonly used in classification problems.
5. **Cluster Analysis:**
- Grouping similar data points into clusters to identify patterns and relationships. Common
algorithms include k-means clustering and hierarchical clustering.
6. **Bayesian Modeling:**
- Incorporates Bayesian statistics to update probabilities based on prior knowledge and new
evidence. Bayesian models are particularly useful when dealing with uncertainty and incorporating
prior beliefs.
7. **Machine Learning Models:**
- Various machine learning algorithms, such as support vector machines, neural networks, and
random forests, are used for predictive modeling and classification tasks.
8. **Model Evaluation and Validation:**
- Involves assessing the performance of a model using metrics like accuracy, precision, recall, and
F1 score. Cross-validation techniques help ensure the model's generalizability to new data.

In both probability distribution and data modeling, the key is to create accurate and meaningful
representations of the underlying processes. Probability distributions describe uncertainty, while
data modeling provides a framework for understanding and predicting real-world phenomena based
on observed data. Together, they form the foundation for statistical analysis and decision-making in
various fields.

sampling and estimation methods overview.


**Overview of Sampling:**

Sampling is the process of selecting a subset of elements from a larger population to make
inferences about the entire population. It is a crucial step in statistical analysis and research, as it is
often impractical or impossible to study an entire population. Different sampling methods are used,
depending on the research goals and characteristics of the population. Here are some common
sampling methods:

1. **Simple Random Sampling:**


- Each member of the population has an equal chance of being selected. This is usually done
using random number generators or a random process.
2. **Stratified Sampling:**
- The population is divided into subgroups or strata based on certain characteristics, and then
random samples are taken from each stratum. This ensures representation from different subgroups.
3. **Systematic Sampling:**
- Every kth member of the population is selected after a random start. It's suitable when the
population is ordered in some way, and a systematic approach is practical.
4. **Cluster Sampling:**
- The population is divided into clusters, and entire clusters are randomly selected. Then, all
members within the selected clusters are included in the sample.
5. **Convenience Sampling:**
- Selection of the easiest and most convenient elements for the study. While quick and cost-
effective, this method may lead to biased samples.
6. **Snowball Sampling:**
- Existing study subjects recruit future subjects from among their acquaintances. This method is
often used in studies where the target population is hard to reach.
7. **Quota Sampling:**
- The researcher selects a sample that reflects the characteristics of the whole population. It is
similar to stratified sampling but does not involve random selection.

**Overview of Estimation Methods:**

Estimation involves using sample data to make inferences or predictions about population
parameters. Two main types of estimation are point estimation and interval estimation.
1. **Point Estimation:**
- Provides a single, specific value as an estimate for the population parameter. The sample mean
(\(\bar{X}\)) is a common point estimator for the population mean (\(\mu\)).
2. **Interval Estimation:**
- Provides a range of values within which the population parameter is likely to fall. It involves
constructing confidence intervals. The margin of error is a key component of interval estimation.
3. **Confidence Intervals:**
- A range of values constructed around a point estimate, providing a level of confidence that the
true population parameter falls within that range. Common confidence levels include 90%, 95%,
and 99%.
4. **Margin of Error:**
- The range above and below a point estimate within which the true parameter value is likely to
fall. It is influenced by the confidence level and variability in the sample.
5. **Hypothesis Testing:**
- While not a traditional estimation method, hypothesis testing is closely related. It involves
making a decision about a population parameter based on sample data and a null hypothesis. The
outcome of a hypothesis test can inform estimation.
6. **Maximum Likelihood Estimation (MLE):**
- A method for estimating the parameters of a statistical model. It seeks the parameter values that
maximize the likelihood function, representing the probability of observing the given sample.
7. **Bayesian Estimation:**
- Involves updating probability estimates based on prior knowledge and new evidence. It
combines prior beliefs (prior distribution) with the likelihood of observed data to obtain a posterior
distribution.
Sampling and estimation methods are fundamental components of statistical analysis. Careful
consideration of the sampling method and appropriate use of estimation techniques contribute to the
validity and reliability of research findings.
Unit 2:
Trendiness and Regression Analysis: Modelling Relationships and Trends in Data, simple Linear
Regression. Important Resources, Business Analytics Personnel, Data and models for Business
analytics, problem solving, Visualizing and Exploring Data, Business Analytics Technology
Modelling Relationships and Trends in Data
Modeling relationships and trends in data is a fundamental aspect of statistical analysis and data
science. It involves developing mathematical or statistical representations that capture the
underlying patterns and structures in the data. Here are several methods and techniques commonly
used for modeling relationships and trends:
1. **Linear Regression:**
- **Purpose:** Modeling a linear relationship between an independent variable (or multiple
variables) and a dependent variable.
- **Equation:** (y = mx + b), where \(y\) is the dependent variable, \(x\) is the independent
variable, \(m\) is the slope, and \(b\) is the y-intercept.
2. **Polynomial Regression:**
- **Purpose:** Extending linear regression to model relationships with higher degrees. Useful
when a curve is a better fit for the data.
- **Equation:** y=f(x)=β0+β1x+β2x2+β3x3+… +βdxd+ + β d x d + ϵ
3. **Exponential and Logarithmic Models:**
- **Exponential Model:** Describes exponential growth or decay.
- **Equation:** \(y = ab^x\), where \(a\) is the initial value, \(b\) is the growth or decay factor,
and \(x\) is the independent variable.
- **Logarithmic Model:** Useful for data that exhibits logarithmic trends.
- **Equation:** \(y = a + b \log(x)\).
4. **Time Series Models:**
- **Purpose:** Modeling trends, seasonality, and cyclic patterns in time-ordered data.
- **Methods:** Autoregressive Integrated Moving Average (ARIMA), Seasonal Decomposition
of Time Series (STL), and more.
5. **Nonlinear Least Squares:**
- **Purpose:** Fitting a model to data where the relationship is not explicitly defined.
- **Methods:** Minimizing the sum of the squares of the differences between observed and
predicted values.
6. **Splines and Piecewise Regression:**
- **Purpose:** Capturing complex trends by fitting multiple simpler models to different
segments of the data.
- **Methods:** Piecewise linear regression or using spline functions.
7. **Generalized Additive Models (GAMs):**
- **Purpose:** Extending linear models to incorporate smooth functions of predictors. Useful
for capturing non-linear relationships.
- **Equation:** \(y = \beta_0 + f_1(x_1) + f_2(x_2) + \ldots + f_k(x_k) + \epsilon\), where
\(f_i(x_i)\) are smooth functions.
8. **Machine Learning Models:**
- **Purpose:** Predicting outcomes based on complex relationships and patterns in the data.
- **Methods:** Decision Trees, Random Forests, Support Vector Machines, Neural Networks,
etc.
9. **Bayesian Modeling:**
- **Purpose:** Incorporating prior knowledge into the modeling process.
- **Methods:** Bayesian Linear Regression, Bayesian Neural Networks, etc.
10. **Quantile Regression:**
- **Purpose:** Modeling relationships at different quantiles of the data distribution.
- **Equation:** \(Q_\tau(y) = X\beta_\tau\), where \(Q_\tau(y)\) is the \(\tau\)-th quantile of
\(y\).

When selecting a modeling approach, it's essential to consider the nature of the data, the assumed
relationship, and the interpretability of the model. It's also crucial to validate models using
techniques like cross-validation and assess model performance against relevant metrics. The choice
of modeling technique depends on the specific characteristics of the dataset and the goals of the
analysis.
simple Linear Regression.
**Simple Linear Regression: Overview**
Simple Linear Regression is a statistical method used to model the relationship between a single
independent variable (\(X\)) and a dependent variable (\(Y\)) by fitting a linear equation to the
observed data. The goal is to find the best-fitting straight line that minimizes the difference between
the observed values and the values predicted by the line. The equation for a simple linear regression
model is represented as:
\[ Y = \beta_0 + \beta_1X + \varepsilon \]
Here's an overview of the key components:
1. **Variables:**
- \( Y \): Dependent variable (response or outcome).
- \( X \): Independent variable (predictor or feature).
- \( \beta_0 \): Y-intercept, representing the value of \( Y \) when \( X \) is 0.
- \( \beta_1 \): Slope of the line, indicating the change in \( Y \) for a one-unit change in \( X \).
- \( \varepsilon \): Error term, accounting for unobserved factors affecting \( Y \) that are not
explained by the linear relationship with \( X \).

2. **Objective:**
- Minimize the sum of squared differences between the observed \( Y \) values and the values
predicted by the linear equation.

3. **Fitting the Line:**


- The coefficients \( \beta_0 \) and \( \beta_1 \) are estimated using statistical methods (commonly
the method of least squares).
- The fitted regression line is expressed as \( \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1X \), where
the hats (\( \hat{} \)) denote estimated values.
4. **Interpretation of Coefficients:**
- \( \hat{\beta}_0 \): The intercept represents the predicted value of \( Y \) when \( X \) is 0.
- \( \hat{\beta}_1 \): The slope represents the change in the predicted \( Y \) for a one-unit change
in \( X \).
5. **Assumptions:**
- Linearity: The relationship between \( X \) and \( Y \) is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Residuals (differences between observed and predicted values) have
constant variance.
- Normality: Residuals are normally distributed.
- No Perfect Multicollinearity: Independent variables are not perfectly correlated.
6. **Goodness of Fit:**
- The coefficient of determination (\( R^2 \)) measures the proportion of the variance in \( Y \)
explained by the linear relationship with \( X \).
7. **Prediction:**
- Once the regression line is fitted, it can be used to make predictions for the dependent variable
based on new values of the independent variable.
8. **Hypothesis Testing:**
- Hypothesis tests can be conducted to determine the significance of the estimated coefficients
and assess the overall fit of the model.

In summary, simple linear regression is a powerful tool for understanding and quantifying the
relationship between two variables. It provides a straightforward approach to modeling and making
predictions based on observed data. However, its applicability is limited to situations where a linear
relationship between the variables is reasonable.

IMPORTANT RESOURCES:
it is necessary to understand resource needs of a BA program to better comprehend the
value of the information that BA provides. The need for BA resources varies by firm to meet
particular decision support requirements. Some firms may choose to have a modest investment,
whereas other firms may have BA teams or a department of BA specialists. Regardless of the level
of resource investment, at minimum, a BA program requires resource investments in BA personnel,
data, and technology.
(1) Business Analytics Personnel
(2) Business analytics technology
(3) Business Analytics Data
Structured and unstructured data is needed to generate analytics. As a beginning for
organizing data into an understandable framework, statisticians usually categorize data into
meaning groups.

Table 3.4 Typical Internal Sources of Data on Which Business AnalyticsCan Be Based
Table 3.5 Typical External Sources of Data on Which Business AnalyticsCan Be Based

(A) Categorizing Data:

There are many ways to categorize business analytics data. Data is commonly
categorized by either internal or external sources.Typical examples of internal data
sources include those presented in TABLE 3.4. When firms try to solve internal
production or service operations problems, internally sourced data may be all that is
needed. Typical external sources of data (SEE TABLE 3.5) are numerous and provide
great diversity and unique challenges for BA to process. Data can be measured
quantitatively (for example, sales dollars) or qualitatively by preference surveys (for
example, products compared based on consumers preferring one product over
another) or by the amount of consumer discussion (chatter) on the Web regarding the
pluses and minuses of competing products

A major portion of the external data sources are found in the literature.For example,
the US Census and the International Monetary Fund (IMF) are useful data sources at
the macroeconomic level for model building.

(B) DATA ISSUES:


couple of data issues that are critical tothe usability of any database or data file. Those
issues are data quality and data privacy.
(A) Data quality:
can be defined as data that serves the purpose forwhich it is collected.
It means different things for different applications, butthere are some
commonalities of high- quality data. These qualities usually include
accurately representing reality, measuring what it is supposed to
measure, being timeless, and having completeness. When data is of high
quality, it helps ensure competitiveness, aids customer service, and
improves profitability. When data is of poor quality, it can provide
information that is contradictory, leading to misguided decision-
making.

For example, having missing data in files can prohibit some forms’ statistical
modeling, and incorrect coding of information can completely render databases useless.
Data quality requires effort on the part of data managers to cleanse data of erroneous
information and repair or replace missing data.

(B) Data privacy:

refers to the protection of shared data such that access is permitted only to
those users for whom it is intended. It is a security issue that requires balancing the
need to know with the risks of sharing too much.

There are many risks in leaving unrestricted access to a company’s database.


For example, competitors can steal a firm’s customers by accessing addresses. Data
leaks on product quality failures can damage brand image, and customers can
become distrustful of a firm that shares information given in confidence. To avoid
these issues, a firm needs to abide by the current legislation regarding customer
privacy and develop a program devoted to data privacy.
Collecting and retrieving data and computing analytics requires the use of
computers and information technology. A large part of what BA personnel do is related
to managing information systems to collect, process,store, and retrieve data from
various sources.
BUSINESS ANALYTICS PERSONNEL:

One way to identify personnel needed for BA staff is to examine what isrequired
for certification in BA by organizations that provide BA services.INFORMS, a major
academic and professional organization, announcedthe startup of a Certified Analytic
Professional (CAP) program in 2013.
Another more established organization, Cognizure, offers a variety of service
products, including business analytic services. It offers a general certification Business
Analytics Professional (BAP) exam that measures existing skill sets in BA staff and
identifies areas needing improvement This is a tool to validate technical proficiency,
expertise, and professional standards in BA. The certificationconsists of three exams
covering the content areas listed in Table 3.1.

Table 3.1 Cognizure Organization Certification Exam Content Areas

Most of the content areas in Table 3.1 will be discussed and illustrated in subsequent
chapters and appendixes. The three exams required in the Cognizure certification program
can easily be understood in the context ofthe three steps of the BA process (descriptive,
predictive, and prescriptive.
The topics in Figure 3.1 of the certification program are applicable to the three
major steps in the BA process. The basic statistical tools apply to the descriptive analytics
step, the more advanced statistical tools apply to the predictive analytics step, and the
operations research tools apply to the prescriptive analytics step. Some of the tools can be
applied to both the descriptive and the predictive steps.

Likewise, tools like simulation can be applied to answer questions in both the
predictive and the prescriptive steps, depending on how they’re used. Atthe conjunction of
all the tools is the reality of case studies. The use of case studies is designed to provide
practical experience where all tools are employed to answer important questions or seek
opportunities.

Figure 3.1 Certification content areas and their relationship to the steps in BA

they also include specialized skill sets related to BA personnel (administrators, designers,
developers,solution experts, and specialists), as presented in Table 3.2.
Table 3.2 Types of BA Personnel

With the variety of positions and roles participants play in the BA process, this leads to
the question of what skill sets or competencies are needed to function in BA. In a general
sense, BA positions require competencies in business, analytic, and information systems
skills. As listedin Table 3.3, business skills involve basic management of people and
processes. BA personnel must communicate with BA staffers within the organization (the BA
team members) and the other functional areas within afirm (BA customers and users) to be
useful. Because they serve a variety of functional areas within a firm, BA personnel need to
possess customer service skills so they can interact with the firm’s personnel and understand
the nature of the problems they seek to solve. BA personnel also need to sell their services to
users inside the firm. In addition, some must lead a BAteam or department, which requires
considerable interpersonal managementleadership skills and abilities.
Table 3.3 Select Types of BA Personnel Skills or Competency Requirements
Fundamental to BA is an understanding of analytic methodologies listed in Table 3.1 and
others not listed. In addition to any tool sets, there is a needto know how they are integrated
into the BA process to leverage data (structured or unstructured) and obtain information that
customers who willbe guided by the analytics desire.

DATA FOR BUSINESS ANALYTICS:


Data are numerical facts and figures that are collected through some type of measurement
process. Information comes from analyzing data—that is, extracting meaning from data to
support evaluation and decision making.
Data are used in virtually every major function in a business. Modern organizations— which
include not only for-profit businesses but also nonprofit organizations—need good data to
support a variety of company purposes, such as planning, reviewing company performance,
improving operations, and comparing company performance with competitors’ or best-practice
benchmarks. Some examples of how data are used in business include the following:
• Annual reports summarize data about companies’ profitability and market share both in
numerical form and in charts and graphs to communicate with shareholders.
• Accountants conduct audits to determine whether figures reported on a firm’sbalance
sheet fairly represent the actual data by examining samples (that is,subsets) of accounting data,
such as accounts receivable.
• Financial analysts collect and analyze a variety of data to understand the contribution
that a business provides to its shareholders. These typically include profitability, revenue
growth, return on investment, asset utilization, operating margins, earnings per share, economic
value added (EVA), shareholder value, and other relevant measures.
• Economists use data to help companies understand and predict population trends,interest
rates, industry performance, consumer spending, and international trade.Such data are often
obtained from external sources such as Standard & Poor’sCompustat data sets, industry trade
associations, or government databases.
• Marketing researchers collect and analyze extensive customer data. These data often
consist of demographics, preferences and opinions, transaction and payment history, shopping
behavior, and a lot more. Such data may be collected bysurveys, personal interviews, focus
groups, or from shopper loyalty cards.
• Operations managers use data on production performance, manufacturing quality,
delivery times, order accuracy, supplier performance, productivity, costs, and environmental
compliance to manage their operations.
• Human resource managers measure employee satisfaction, training costs, turn-over,
market innovation, training effectiveness, and skills development.
Data Sets and Databases:
A data set is simply a collection of data. Marketing survey responses, a table of historical stock
prices, and a collection of measurements of dimensions of a manufactured item are examples of
data sets. A database is a collection of related files containing records onpeople, places, or
things. The people, places, or things for which we store and maintain information are called
entities. A database for an online retailer that sells instructional fitness books and DVDs, for
instance, might consist of a file for three entities: publishers from which goods are purchased,
customer sales transactions, and product inventory.A database file is usually organized in a two-
dimensional table, where the columns correspond to each individual element of data (called
fields, or attributes), and the rows represent records of related data elements. A key feature of
computerized databases is theability to quickly relate one set of files to another.
Databases are important in business analytics for accessing data, making queries, and other data
and information management activities. Software such as Microsoft Access provides powerful
analytical database capabilities. However, in this book, we won’t be delving deeply into
databases or database management systems but will work with indi vidual database files or
simple data sets. Because spreadsheets are convenient tools for storing and manipulating data
sets and database files, we will use them for all examples and problems. data requires advanced
analytics tools such as data mining and text analytics, and new technologies such as cloud
computing, faster multi-core processors, large memory spaces, and solid-state drives.
Metrics and Data Classification:
A metric is a unit of measurement that provides a way to objectively quantify perfor- mance.
For example, senior managers might assess overall business performance using such metrics as
net profit, return on investment, market share, and customer satisfaction. A plant manager might
monitor such metrics as the proportion of defective parts produced or the number of inventory
turns each month. For a Web-based retailer, some useful met- rics are the percentage of orders
filled accurately and the time taken to fill a customer’s order. Measurement is the act of
obtaining data associated with a metric. Measures are numerical values associated with a
metric.
Metrics can be either discrete or continuous. A discrete metric is one that is de- rived from
counting something. For example, a delivery is either on time or not; anorder is complete or
incomplete; or an invoice can have one, two, three, or any numberof errors. Some discrete
metrics associated with these examples would be the propor-tion of on-time deliveries; the
number of incomplete orders each day, and the number of errors per invoice. Continuous
metrics are based on a continuous scale of measure ment. Any metrics involving dollars, length,
time, volume, or weight, for example, are continuous.
Another classification of data is by the type of measurement scale. Data may be clas-sified into
four groups:
(A) Categorical (nominal) data, which are sorted into categories according to specified
characteristics. For example, a firm’s customers might be classi- fied by their geographical
region (North America, South America, Europe, and Pacific); employees might be classified as
managers, supervisors, and associates. The categories bear no quantitative relationship to one
another, butwe usually assign an arbitrary number to each category to ease the processof
managing the data and computing statistics. Categorical data are usually counted or expressed
as proportions or percentages.
(B) Ordinal data, which can be ordered or ranked according to some relationshipto one
another. College football or basketball rankings are ordinal; a higher ranking signifies a stronger
team but does not specify any numerical measure of strength. Ordinal data are more meaningful
than categorical data because data can be compared to one another. A common example in
business is data from survey scales—for example, rating a service as poor, average, good,very
good, or excellent. Such data are categorical but also have a natural order(excellent is better
than very good) and, consequently, are ordinal. However, ordinal data have no fixed units of
measurement, so we cannot make mean- ingful numerical statements about differences between
categories. Thus, we cannot say that the difference between excellent and very good is the same
as between good and average, for example. Similarly, a team ranked number1 may be far
superior to the number 2 team, whereas there may be little differ-ence between teams ranked
9th and 10th.
(C) Interval data, which are ordinal but have constant differences between obser- vations
and have arbitrary zero points. Common examples are time and temper- ature. Time is relative
to global location, and calendars have arbitrary startingdates (compare, for example, the
standard Gregorian calendar with the Chinese calendar). Both the Fahrenheit and Celsius scales
represent a specified mea- sure of distance—degrees—but have arbitrary zero points. Thus we
cannot takemeaningful ratios; for example, we cannot say that 50 degrees is twice as hot as 25
degrees. However, we can compare differences. Another example is SATor GMAT scores. The
scores can be used to rank students, but only differencesbetween scores provide information on
how much better one student performedover another; ratios make little sense. In contrast to
ordinal data, interval data allow meaningful comparison of ranges, averages, and other
statistics.
Ratio data, which are continuous and have a natural zero. Most business and economic data,
such as dollars and time, fall into this category. For example, the measure dollars has an
absolute zero. Ratios of dollar figures are meaning- ful. For example, knowing that the Seattle
region sold $12 million in March whereas the Tampa region sold $6 million means that Seattle
sold twice as much as Tampa.

DATA RELIABILITY AND VALIDITY:


Poor data can result in poor decisions. In one situation, a distribution system design modelrelied
on data obtained from the corporate finance department. Transportation costs were determined
using a formula based on the latitude and longitude of the locations of plants and customers.
But when the solution was represented on a geographic information sys- tem (GIS) mapping
program, one of the customers was in the Atlantic Ocean.
Thus, data used in business decisions need to be reliable and valid. Reliability means that data
are accurate and consistent. Validity means that data correctly measure what they are supposed
to measure.
Data and models for Business analytics,
In business analytics, the use of data and models is fundamental for extracting insights, making
informed decisions, and gaining a competitive advantage. Here's an overview of how data and
models are utilized in business analytics:
### Data in Business Analytics:
1. **Data Collection:**
- Businesses collect data from various sources, including customer interactions, sales
transactions, social media, surveys, and more.
- Data can be categorized as structured (in databases) or unstructured (text, images, videos).
2. **Data Cleaning and Preprocessing:**
- Raw data often requires cleaning to handle missing values, outliers, and inconsistencies.
- Preprocessing involves transforming and organizing data for analysis, including
normalization and feature engineering.
3. **Exploratory Data Analysis (EDA):**
- EDA involves visualizing and summarizing data to identify patterns, trends, and
relationships.
- Techniques include histograms, scatter plots, and summary statistics.
4. **Descriptive Analytics:**
- Descriptive analytics provides a snapshot of historical data to understand what has
happened.
- Key performance indicators (KPIs) and metrics are used to summarize and interpret data.
5. **Predictive Analytics:**
- Predictive analytics involves using statistical algorithms and machine learning models to
forecast future trends and outcomes.
- Techniques include regression analysis, time series forecasting, and machine learning
algorithms.
6. **Prescriptive Analytics:**
- Prescriptive analytics recommends actions based on predictions. It suggests the best course
of action to achieve a desired outcome.
- Optimization and simulation models are commonly used in prescriptive analytics.
### Models in Business Analytics:
1. **Regression Models:**
- Used to model the relationship between dependent and independent variables.
- Simple and multiple linear regression, logistic regression for classification.
2. **Time Series Models:**
- Used for analyzing and forecasting time-ordered data.
- ARIMA (Autoregressive Integrated Moving Average) and Exponential Smoothing are
common time series models.
3. **Machine Learning Models:**
- Supervised Learning: Algorithms like decision trees, random forests, support vector
machines for classification and regression.
- Unsupervised Learning: Clustering algorithms (k-means, hierarchical clustering) for
segmenting data.
- Neural networks for complex pattern recognition.
4. **Decision Trees and Random Forests:**
- Decision trees represent decisions and their possible consequences.
- Random forests are an ensemble of decision trees, providing more robust predictions.
5. **Clustering Models:**
- Used to identify natural groupings in data.
- K-means clustering, hierarchical clustering, and DBSCAN are common clustering
algorithms.
6. **Text Analytics and Natural Language Processing (NLP):**
- Analyzing and extracting insights from unstructured text data.
- Sentiment analysis, topic modeling, and named entity recognition.
7. **Simulation Models:**
- Used for mimicking real-world processes and experimenting with different scenarios.
- Monte Carlo simulation is a common technique for risk analysis.
8. **Optimization Models:**
- Used for finding the best solution to a problem with constraints.
- Linear programming, integer programming, and nonlinear optimization models.

### Integration of Data and Models:


1. **Model Training:**
- Models are trained using historical data, and the parameters are optimized to make accurate
predictions.
2. **Model Evaluation:**
- Models are evaluated using metrics such as accuracy, precision, recall, and F1 score for
classification, and Mean Squared Error (MSE) for regression.
3. **Deployment and Integration:**
- Successful models are deployed into business processes for real-time decision-making.
- Integration with business systems allows for seamless utilization of analytic insights.
4. **Continuous Improvement:**
- Models are monitored for performance, and updates are made as new data becomes
available.
- Continuous feedback loops ensure models remain accurate and relevant.

In summary, the combination of data and models in business analytics enables organizations to
gain valuable insights, optimize processes, and make data-driven decisions for improved
performance and competitiveness.
Visualizing and Exploring Data,
Visualizing and exploring data are essential steps in the data analysis process. These activities
help uncover patterns, trends, and relationships within the data, making it easier to derive
meaningful insights. Here are various techniques and tools for visualizing and exploring data:
### 1. **Descriptive Statistics:**
- Use summary statistics (mean, median, standard deviation) to understand the central
tendency and variability of the data.
- Identify outliers and anomalies that may require further investigation.
### 2. **Histograms:**
- Create histograms to visualize the distribution of a single variable.
- Understand the frequency and density of data points within different ranges.
### 3. **Box Plots (Box-and-Whisker Plots):**
- Display the distribution of a dataset and identify outliers.
- Show quartiles, median, and potential skewness.
### 4. **Scatter Plots:**
- Explore relationships between two continuous variables.
- Identify patterns, correlations, and potential outliers.
### 5. **Pair Plots:**
- Visualize pairwise relationships between multiple variables.
- Helpful for identifying patterns and correlations in multivariate datasets.
### 6. **Correlation Heatmaps:**
- Display correlation coefficients between variables using color intensity.
- Quickly identify strong positive or negative correlations.
### 7. **Line Charts:**
- Show trends in data over time or across a continuous variable.
- Useful for time-series data and continuous variables.
### 8. **Bar Charts:**
- Display the distribution of a categorical variable.
- Compare the frequency or proportion of different categories.
### 9. **Pie Charts:**
- Illustrate the proportion of each category in a whole.
- Useful for displaying parts of a whole (percentages).
### 10. **Area Charts:**
- Show the cumulative contribution of different variables over time.
- Effective for visualizing trends and patterns in cumulative data.
### 11. **Violin Plots:**
- Combine aspects of box plots and kernel density plots to display the distribution of data.
- Useful for comparing distributions across categories.
### 12. **Word Clouds:**
- Visualize word frequency in textual data.
- Words are displayed with sizes proportional to their frequencies.
### 13. **Geospatial Maps:**
- Use maps to visualize data with a geographic component.
- Display data points or aggregated values on a map.
### 14. **Interactive Dashboards:**
- Create interactive dashboards using tools like Tableau, Power BI, or Plotly.
- Allow users to explore data dynamically by adjusting parameters.
### 15. **3D Plots:**
- Visualize relationships in three-dimensional space.
- Useful when exploring interactions between three variables.
### 16. **Parallel Coordinates Plots:**
- Display multivariate data by representing each observation as a line.
- Useful for visualizing relationships between multiple variables.
### 17. **Network Graphs:**
- Visualize relationships between entities in a network.
- Nodes represent entities, and edges represent connections.
### 18. **Time Series Decomposition:**
- Decompose time-series data into trend, seasonality, and residual components.
- Understand the underlying patterns in time-dependent data.
### 19. **Distribution Plots (e.g., KDE Plots):**
- Visualize the probability distribution of a continuous variable.
- Kernel Density Estimation (KDE) plots provide a smooth estimate of the distribution.
### 20. **Treemaps:**
- Represent hierarchical data structures using nested rectangles.
- Visualize the proportion of each category within a hierarchy.
### Tools for Data Visualization:
- **Matplotlib:** A popular Python library for creating static, animated, and interactive
visualizations.
- **Seaborn:** Built on top of Matplotlib, Seaborn provides a high-level interface for
statistical data visualization.
- **Plotly:** Offers interactive and dynamic visualizations, including charts and dashboards.
- **Tableau:** A powerful data visualization tool that allows users to create interactive and
shareable dashboards.
- **Power BI:** A business analytics service by Microsoft for creating interactive reports
and dashboards.
### Best Practices:
- Choose visualizations based on the nature of your data and the insights you want to convey.
- Label axes, provide legends, and add annotations for clarity.
- Consider the audience and purpose of the visualization.
- Iterate and refine visualizations based on feedback and insights gained during exploration.

Remember that effective data visualization is not only about creating aesthetically pleasing
charts but also about conveying information in a clear and insightful manner.

Business Analytics Technology


Firms need an information technology (IT) infrastructure that supportspersonnel in the conduct
of their daily business operations. The general requirements for such a system are stated in
Table 3.6. These types of technology are elemental needs for business analytics operations.

Table 3.6 General Information Technology (IT) Infrastructure

Of particular importance for BA is the data management technologies listed in Table


3.6. Database management systems (DBMS) is a data management technology software
that permits firms to centralize data, manage it efficiently, and provide access to stored
data by application programs. DBMS usually serves as an interface between application
programs and the physical data files of structured data. DBMS makes thetask of
understanding where and how the data is actually stored more efficient. In addition,
other DBMS systems can handle unstructured data. For example, object-oriented DBMS
systems are able to store and retrieveunstructured data, like drawings, images,
photographs, and voice data.
These types of technology are necessary to handle the load of big data that most firms
currently collect.
DBMS includes capabilities and tools for organizing, managing, and accessing data
in databases. Four of the more important capabilities are its data definition language, data
dictionary, database encyclopedia, and data manipulation language. DBMS has a data
definition capability to specify the structure of content in a database. This is used to create
database tables and characteristics used in fields to identify content. These tables and
characteristics are critical success factors for search efforts as the database grows in size.
These characteristics are documented in the data dictionary (an automated or manual file
that stores the size, descriptions, format, and other properties needed to characterize data).

The database encyclopedia is a table of contents listing a firm’s current data


inventory and what data filescan be built or purchased. The typical content of the database
encyclopediais presented in Table 3.7. Of particular importance for BA is the data
manipulation language tools included in DMBS. These tools are used to search databases
for specific information. An example is structure query language (SQL), which allows
users to find specific data through a session of queries and responses in a database.

Table 3.7 Database Encyclopedia Content

Data warehouses are databases that store current and historical data of potential interest
to decision makers. What a data warehouse does is make data available to anyone who
needs access to it. In a data warehouse, the data is prohibited from being altered. Data
warehouses also provide a set ofquery tools, analytical tools, and graphical reporting
facilities. Some firms use intranet portals to make data warehouse information widely
available throughout a firm.

Data marts are focused subsets or smaller groupings within a data warehouse.
Firms often build enterprise-wide data warehouses where a central data warehouse
serves the entire organization and smaller, decentralized data warehouses (called data
marts) are focused on a limitedportion of the organization’s data that is placed in a
separate database for aspecific population of users. For example, a firm might develop
a smaller database on just product quality to focus efforts on quality customer and
product issues. A data mart can be constructed more quickly and at lower cost than
enterprise-wide data warehouses to concentrate effort in areas ofgreatest concern.

Online analytical processing (OLAP) is software that allows users to view data in
multiple dimensions. For example, employees can be viewed in terms of their age, sex,
geographic location, and so on. OLAP would allow identification of the number of
employees who are age 35, male, and in thewestern region of a country. OLAP allows
users to obtain online answers toad hoc questions quickly, even when the data is stored in
very large databases.
Data mining is the application of a software, discovery-driven process that provides
insights into business data by finding hidden patterns and relationships in big data or
large databases and inferring rules from them topredict future behavior. The observed
patterns and rules are used to guide decision-making. They can also act to forecast the
impact of those decisions.
Text mining is a software application used toextract key elements from unstructured
data sets, discover patterns and relationships in the text materials, and summarize the
information.
Web mining seeks to find patterns, trends, and insights into customer behavior from
users of the Web.
Analysis ToolPak isan Excel add-in that contains a variety of statistical tools (for example,
graphics and multiple regression) for the descriptive and predictive BA process steps. Another
Excel add-in, Solver, contains operations research optimization tools (for example, linear
programming) used in the prescriptive step of the BA Proces.
Table 3.8 Types of Information Obtainable with Data Mining Technology
Unit 3:
Organization Structures of Business analytics, Team management, Management Issues, Designing
Information Policy, Outsourcing, Ensuring Data Quality, Measuring contribution of Business
analytics, Managing Changes. Descriptive Analytics, predictive analytics, predicative Modelling,
Predictive analytics analysis, DataMining, Data Mining Methodologies, Prescriptive analytics and
its step in the business analytics Process, Prescriptive Modelling, nonlinear Optimization.
Organization Structures of Business analytics
to successfully implement business analytics (BA) within organizations, the BA in
whatever organizational form it takes must be fully integrated throughout afirm. This requires BA
resources to be aligned in a way that permits a viewof customer information within and across all
departments, access to customer information from multiple sources (internal and external to the
organization), access to historical analytics from a central repository, and making technology
resources align to be accountable for analytic success. The commonality of these requirements is
the desire for an alignment that maximizes the flow of information into and through the BA
operation, which in turn processes and shares information to desired users throughout the
organization.

(A) most organizations are hierarchical, with senior managers making the strategic planning
decisions, middle-level managers making tactical planning decisions, and lower-level managers
making operational planningdecisions. Within the hierarchy, other organizational structures
exist to support the development and existence of groupings of resources like thoseneeded for
BA. These additional structures include programs, projects, andteams. A program in this context
is the process that seeks to create an outcome and usually involves managing several related
projects with the intention of improving organizational performance. A program can also bea
large project. A project tends to deliver outcomes and can be defined as having temporary rather
than permanent social systems within or across organizations to accomplish particular and
clearly defined tasks, usually under time constraints. Projects are often composed of teams. A
team consists of a group of people with skills to achieve a common purpose.Teams are especially
appropriate for conducting complex tasks that havemany interdependent subtasks.

The relationship of programs, projects, and teams with a business hierarchy is presented in
Figure 4.1. Within this hierarchy, the organization’s senior managers establish a BA program
initiative to mandatethe creation of a BA grouping within the firm as a strategic goal. A BA
program does not always have an end-time limit. Middle-level managers reorganize or break
down the strategic BA program goals into doable BA project initiatives to be undertaken in a
fixed period of time. Some firms have only one project (establish a BA grouping) and others,
depending on the organization structure, have multiple BA projects requiring the creation of
multiple BA groupings. Projects usually have an end-time date in which to judge the
successfulness of the project. The projects in some cases are further reorganized into smaller
assignments, called BA team initiatives, to operationalize the broader strategy of the BA
program. BA teams may have a long-standing time limit (for example, to exist as the main source
of analytics for an entire organization) or have a fixed period (for example, to work on a specific
product quality problem and then end).

Figure 4.1 Hierarchal relationships program, project, and team planning

In summary, one way to look at the alignment of BA resources is to view


it as a progression of assigned planning tasks from a BA program, to BAprojects, and
eventually to BA teams for implementation. As shown in Figure 4.1, this hierarchical
relationship is a way to examine how firms align planning and decision-making workload
to fit strategic needs and requirements.

BA organization structures usually begin with an initiative that recognizes the need to
use and develop some kind of program in analytics. Fortunately, most firms today recognize
this need. The question then becomes how to match the firm’s needs within the organization to
achieve its strategic, tactical, and operations objectives within resource limitations. Planning
the BA resource allocation within the organizational structure of afirm is a starting place for the
alignment of BA to best serve a firm’s needs.

Aligning the BA resources requires a determination of the amount of resources a firm wants
to invest. The outcome of the resource investment might identify only one individual to compute
analytics for a firm. Becauseof the varied skill sets in information systems, statistics, and operations
research methods, a more common beginning for a BA initiative is the creation of a BA team
organization structure possessing a variety of analytical and management skills.

(B) Another way of aligning BA resources within an organization is to use a project structure.
Most firms undertake projects, and some firms actually use a project structure for their entire
organization.

In organizations where functional departments are structured on a strict hierarchy, separate


BA departments orteams have to be allocated to each functional area, as presented in Figure.
This functional organization structure may have the benefit of stricterfunctional control by the
VPs of an organization and greater efficiency in focusing on just the analytics within each
specialized area. On the other hand, this structure does not promote the cross-department access
that is suggested as a critical success factor for the implementation of a BA program.

Figure 4.2 Functional organization structure with BA

The needs of each firm for BA sometimes dictate positioning BA within existing organization
functional areas. Clearly, many alternative structures can house a BA grouping. For example,
because BA provides information to users, BA could be included in the functional area of
management information systems, with the chief information officer (CIO) acting as boththe
director of information systems (which includes database management) and the leader of the
BA grouping.

(C) found in large organizations aligns resources by project or product and is called a

matrix organization. As illustrated in Figure 4.3, this structure allows the VPs some indirect
control over their related specialists, which would include the BA specialists but also allows
direct control by the project or product manager. This, similar to the functional organizational
structure, does not promote the cross-department access suggested for a successful
implementation of a BA program.
Figure 4.3 Matrix organization structure

The literature suggests that the organizational structure that best aligns BA resources is one
in which a department, project, or team is formed in a staff structure where access to and from
the BA grouping of resources permits access to all areas within a firm, as illustrated in Figure
4.4 The dashed line indicates a staff (not line management) relationship. This centralized BA
organization structure minimizes investment costs by avoiding duplications found in both the
functional and the matrix styles of organization structures. At the same time, it maximizes
information flow between and across functional areas inthe organization. This is a logical
structure for a BA group in its advisory role to the organization.

These include a reduction in the filtering of information traveling upward throughthe


organization, insulation from political interests, breakdown of the siloed functional area
communication barriers, a more central platform for reviewing important analyses that require a
broader field of specialists, analytic-based group decision-making efforts, separation of the line
management leadership from potential clients (for example, the VP of marketing would not
necessarily come between the BA group working on customer service issues for a department
within marketing), and better connectivity between BA and all personnel within the area of
problem solving.
Figure 4.4 Centralized BA department, project, or team organizationstructure
Given the advocacy and logic recommending a centralized BA grouping,there are reasons
for all BA groupings to be centralized. These reasons helpexplain why BA initiatives that seek
to integrate and align BA resources into any type of BA group within the organization
sometimes fail.

Team management,

When it comes to getting the BA job done, it tends to fall to a BA team. For firms that employ
BA teams the participants can be defined by the rolesthey play in the team effort. Some of the
roles BA team participants undertake and their typical background are presented in Table 4.2.

Aligning BA teams to achieve their tasks requires collaboration efforts from team members
and from their organizations. Like BA teams, collaboration involves working with people to
achieve a shared and explicitset of goals consistent with their mission. BA teams also have a
specific mission to complete. Collaboration through teamwork is the means to accomplish their
mission.
Team members’ need for collaboration is motivated by changes in the nature of work (no
more silos to hide behind, much more open environment, and so on), growth in professions
(for example, interactive jobs tend to be more professional, requiring greater variety in
expertise sharing), and the need to nurture innovation (creativity and innovation are fostered
by collaboration with a variety of people sharing ideas). To keep one’s job and to progress in
any business career, particularly in BA, team members must encourage working with other
members inside a team and out.
Table 4.2 BA Team Participant Roles*
For organizations, collaboration is motivated by the changing nature ofinformation flow
(that is, hierarchical flows tend to be downward, whereas in modern organizations, flow is in
all directions) and changes in the scopeof business operations (that is, going from domestic
to global allows for a greater flow of ideas and information from multiple sources in multiple
locations).

Management Issues,

Aligning organizational resources is a management function. There are general


management issues that are related to a BA program, and some arespecifically important to
operating a BA department, project, or team. The ones covered in this section include
establishing an information policy, outsourcing business analytics, ensuring data quality,
measuring business analytics contribution, and managing change.

 Establishing an Information Policy:


There is a need to manage information. This is accomplished by establishing an information
policy to structure rules on how information anddata are to be organized and maintained and who
is allowed to view the dataor change it. The information policy specifies organizational rules for
sharing, disseminating, acquiring, standardizing, classifying, and inventorying all types of
information and data. It defines the specific procedures and accountabilities that identify which
users and organizationalunits can share information, where the information can be distributed,
and who is responsible for updating and maintaining the information.
In small firms, business owners might establish the information policy.For larger firms,
data administration may be responsible for the specific policies and procedures for data
management. Responsibilities could include developing the information policy, planning
data collection and storage, overseeing database design, developing the data dictionary, as
well as montoring how information systems specialists and end user groups use data.

 Outsourcing Business Analytics:


Outsourcing can be defined as a strategy by which an organization chooses to allocate some
business activities and responsibilities from an internal source to an external source. Outsourcing
business operations is a strategy that an organization can use toimplement a BA program, run
BA projects, and operate BA teams. Any business activity can be outsourced, including BA.
Outsourcing is an important BA management activity that should be considered as a viable
alternative in planning an investment in any BA program.
BA is a staff function that is easier to outsource than other line management tasks, such as
running a warehouse. To determine if outsourcing is a useful option in BA programs,
management needs to balance the advantages of outsourcing with its disadvantages. Some of
theadvantages of outsourcing BA include those listed in Table 4.4.

Table 4.4 Advantages of Outsourcing BA


Some of disadvantages to outsourcing are presented in Table 4.5.

Table 4.5 Disadvantages of Outsourcing BA Managing


 Ensuring Data Quality:
Business analytics, if relevant, is based on data assumed to be of high quality. Data quality
refers to accuracy, precision, and completeness of data. High-quality data is considered to
correctly reflect the real world in which itis extracted. Poor quality data caused by data entry
errors, poorly maintained databases, out-of-date data, and incomplete data usually leads tobad
decisions and undermines BA within a firm. Organizationally, the database management systems
(DBMS, mentioned in personnel are managerially responsible for ensuring data quality. Because
of its importance and the possible location of the BA department outside of the management
information systems department (which usually hosts the DBMS), it is imperative that whoever
leads the BA program should seek to ensure data quality efforts are undertaken.
An organization needs to identify and correct faulty data and establish routines and
procedures for editing data in the database. The analysis of data quality can begin with a data
quality audit, where a structured survey or inspection of accuracy and level of completeness of
data is undertaken. This audit may be of the entire database, just a sample of files, or a survey
of end users for perceptions of the data quality. If during the data quality audit files are found
that have errors, a process called data cleansing or data scrubbing is undertaken to eliminate or
repair data. Some of the areas in a data file that should be inspected in the audit and suggestions
on how tocorrect them are presented in Table 4.6.
Table 4.6 Quality Data Inspection Items and Recommendations

 Measuring Business Analytics Contribution:


The investment in BA must continually be justified by communicatingthe BA contribution
to the organization for ongoing projects. This means that performance analytics should be
computed for every BA project and BA team initiative. These analytics should provide an
estimate of the tangible and intangible values being delivered to the organization. This should
also involve establishing a communication strategy to promote thevalue being estimated.
Measuring the value and contributions that BA brings to an organization is essential to
helping the firm understand why the application of BA is worth the investment. Some BA
contribution estimates can be computed using standard financial methods, such as payback
period (how long it takesfor the initial costs are returned by profit) or return on investment
(ROI)), where dollar values or quantitative analysis is possible. When intangible contributions
are a major part of the contribution being delivered to the firm, other methods like cost/benefit
analysis include intangible benefits, should be used.

 Managing Change:

Wells (2000) found that what is critical in changing organizations is organizational culture
and the use of change management. Organizational culture is how an organization supports
cooperation, coordination, and empowerment of employees . Change management is defined as
an approach for transitioning the organization (individuals, teams, projects, departments) to a
changed and desired future state .Change management is a means of implementing change in an
organization, such as adding a BA department .Changes in an organization can be either planned
(a result of specific and planned efforts at change withdirection by a change leader) or unplanned
(spontaneous changes without direction of a change leader).

The application of BA invariably will result inboth types of changes because of BA’s specific
problem-solving role (a desired, planned change to solve a problem) and opportunity finding
exploratory nature (i.e., unplanned new knowledge opportunity changes) of BA. Change
management can also target almost everything that makes up an organization (see Table 4.7).

Table 4.7 Change Management Targets

Some of these activities that lead to change management success are presented as best
practices in Table 4.8.
Table 4.8 Change Management Best Practices

Descriptive Analytics,
Descriptive analytics involves analyzing and summarizing historical data to gain insights into
patterns, trends, and characteristics of a particular phenomenon. This type of analytics focuses on
understanding what has happened in the past. Below are some examples of descriptive analytics and
how they can be illustrated:
1. **Histograms:**
- **Description:** Histograms are graphical representations of the distribution of a dataset. They
display the frequency or probability of different values in a dataset.
- **Illustration:** A histogram can be created to show the distribution of sales revenue for a
specific product over the past year. The x-axis represents revenue ranges, and the y-axis represents
the frequency or count of occurrences in each range.
2. **Pie Charts:**
- **Description:** Pie charts are circular statistical graphics that are divided into slices to
illustrate numerical proportions.
- **Illustration:** A pie chart can be used to show the percentage distribution of sales across
different product categories. Each slice represents a product category, and the size of the slice
corresponds to its percentage share of the total sales.
3. **Line Charts:**
- **Description:** Line charts are used to represent data points over a continuous interval or time
span. They are commonly used to show trends over time.
- **Illustration:** A line chart can illustrate the monthly website traffic over the past year. Each
point on the line represents the number of visits in a specific month, showing the overall trend of
website traffic.
4. **Scatter Plots:**
- **Description:** Scatter plots display individual data points on a two-dimensional graph, with
one variable on the x-axis and another on the y-axis. They are useful for identifying relationships
between variables.
- **Illustration:** A scatter plot can show the relationship between advertising spending and sales
revenue. Each point represents a specific time period, and the position of the point indicates the
corresponding values for advertising spending and sales revenue.
5. **Tabular Reports:**
- **Description:** Tabular reports present data in a table format, providing a detailed view of
individual data points or summary statistics.
- **Illustration:** A tabular report can display monthly expenses for a business, breaking down
costs into categories such as utilities, rent, and salaries. Each row represents a specific month, and
columns show the expenses for each category.
Descriptive analytics tools and visualizations help businesses and analysts make sense of historical
data, identify patterns, and draw insights to inform decision-making processes.
predictive analytics,
Predictive analytics involves using data, statistical algorithms, and machine learning techniques to
identify the likelihood of future outcomes based on historical data. Here are some examples
illustrating predictive analytics:
1. **Credit Scoring:**
- **Scenario:** A bank wants to predict the likelihood of a customer defaulting on a loan.
- **Illustration:** Using predictive analytics, the bank can develop a credit scoring model. The
model considers various factors such as credit history, income, and debt to predict the probability of
a customer defaulting on a loan. This helps the bank make informed decisions about loan approvals
and interest rates.
2. **Customer Churn Prediction:**
- **Scenario:** A telecommunications company wants to identify customers at risk of churning.
- **Illustration:** By analyzing historical customer data, including usage patterns, customer
service interactions, and billing information, a predictive model can be built to forecast which
customers are likely to churn. The company can then take proactive measures, such as targeted
promotions or retention offers, to reduce churn.
3. **Inventory Management:**
- **Scenario:** An e-commerce retailer wants to optimize inventory levels.
- **Illustration:** Predictive analytics can be applied to analyze past sales data, seasonality, and
other relevant factors. By forecasting future demand for each product, the retailer can optimize
inventory levels, reduce carrying costs, and ensure products are available when customers want to
purchase them.
4. **Healthcare Readmission Prediction:**
- **Scenario:** A hospital aims to predict the likelihood of a patient being readmitted after a
specific medical procedure.
- **Illustration:** Using predictive analytics, the hospital can analyze patient data, including
medical history, vital signs, and previous admissions. A predictive model can then identify patients
at a higher risk of readmission, allowing healthcare providers to intervene with appropriate care and
resources to reduce readmission rates.
5. **Predictive Maintenance in Manufacturing:**
- **Scenario:** A manufacturing plant wants to minimize equipment downtime by predicting
when machinery is likely to fail.
- **Illustration:** Sensor data from machines can be analyzed using predictive analytics to
identify patterns indicative of potential equipment failure. By predicting maintenance needs in
advance, the plant can schedule maintenance activities proactively, reducing unplanned downtime
and optimizing operational efficiency.
6. **Fraud Detection:**
- **Scenario:** A financial institution aims to detect fraudulent transactions.
- **Illustration:** Predictive analytics can analyze transaction data, looking for patterns and
anomalies that may indicate fraudulent activity. Machine learning models can continuously learn
from new data to improve their accuracy in identifying potentially fraudulent transactions in real-
time.
Predictive analytics enables organizations to make data-driven decisions, anticipate future trends,
and proactively address challenges or opportunities. It is a powerful tool for enhancing business
operations across various industries.
predicative Modelling,

Predictive modeling means developing models that can be used to forecast or predict
future events. In business analytics, models can bedeveloped based on logic or data.

(A) Logic-Driven Models:


A logic-driven model is one based on experience, knowledge, and logicalrelationships of
variables and constants connected to the desired business performance outcome situation. The
question here is how to put variables and constants together to create a model that can predict
the future. Doing this requires business experience. Model building requires an understanding
of business systems and the relationships of variables and constants that seek to generate a
desirable business performance outcome. To help conceptualize the relationships inherent in a
business system, diagramming methods can be helpful. For example, the cause-and-effect
diagram is a visual aid diagram that permits a user to hypothesize relationships between
potential causes of an outcome (see Figure 6.1). This diagram lists potentialcauses in terms of
human, technology, policy, and process resources in an effort to establish some basic
relationships that impact business performance. The diagram is used by tracing contributing and
relational factors from the desired business performance goal back to possible causes, thus
allowing the user to better picture sources of potential causes that could affect the performance.
This diagram is sometimes referred to as a fishbone diagram because of its appearance.

Figure 6.1 Cause-and-effect diagram


Another useful diagram to conceptualize potential relationships with business
performance variables is called the influence diagram. According to Evans influence diagrams
can be useful to conceptualize the relationships of variables in the development of models. An
example of an influence diagram is presented in Figure 6.2. It maps the relationship of variables
and a constant to the desired business performanceoutcome of profit. From such a diagram, it is
easy to convert the information into a quantitative model with constants and variables that define
profit in this situation:
Profit = Revenue − Cost, or
Profit = (Unit Price × Quantity Sold) − [(Fixed Cost) + (Variable Cost ×Quantity Sold)], or
P = (UP × QS) − [FC + (VC × QS)]

Figure 6.2 An influence diagram

(B) Data-Driven Models:


Logic-driven modeling is often used as a first step to establish relationships through data-
driven models (using data collected from many sources to quantitatively establish model
relationships). To avoid duplication of content and focus on conceptual material in the chapters,
most of the computational aspects and some computer usage content are relegated to the
appendixes. In addition, some of the methodologies are illustrated in the case problems presented
in this book. Please refer to the Additional Information column in Table 6.1 to obtain further
information onthe use and application of the data-driven models.
Table 6.1 Data-Driven Models

Predictive analytics analysis,

An ideal multiple variable modeling approach that can be used in this situation to
explore variable importance in this case study and eventuallylead to the development of a
predictive model for product sales is correlation and multiple regression. We will use both
Excel and IBM’s SPSS statistical packages to compute the statistics in this step of the BA
process.
First, we must consider the four independent variables—radio, TV, newspaper, POS—
before developing the model.
One way to see the statistical direction of the relationship (which is better than just
comparing graphic charts) is to compute the Pearson correlation coefficients r betweeneach of
the independent variables with the dependent variable (product sales). The SPSS correlation
coefficients and their levels of significance arepresented in Table 6.4. The comparable Excel
correlations are presented in Figure 6.5.

Table 6.4 SPSS Pearson Correlation Coefficients: Marketing/Planning

Figure 6.5 Excel Pearson correlation coefficients: marketing/planning casestudy

Although it can be argued that the positive or negative correlation coefficients should not
automatically discount any variable from what will be a predictive model, the negative
correlation of newspapers suggests that as a firm increases investment in newspaper ads, it will
decrease product sales. This does not make sense in this case study. Given the illogic of sucha
relationship, its potential use as an independent variable in a model is questionable. Also, this
negative correlation poses several questions that should be considered. Was the data set
correctly collected? Is the data set accurate? Was the sample large enough to have included
enough data for this variable to show a positive relationship? Should it be included for further
analysis? Although it is possible that a negative relationship can statistically show up like this,
it does not make sense in this case. Based on this reasoning and the fact that the correlation is
not statistically significant,this variable (i.e., newspaper ads) will be removed from further
consideration in this exploratory analysis to develop a predictive model.
Some researchers might also exclude POS based on the insignificance (p=0.479) of its
relationship with product sales. However, for purposes ofillustration, continue to consider it
a candidate for model inclusion. Also, the other two independent variables (radio and TV)
were both found to besignificantly related to product sales, as reflected in the correlation
coefficients in the tables.

The procedure by which multiple regression can be used to evaluate which independent
variables are best to include or exclude in a linear modelis called step-wise multiple
regression. It is based on an evaluation of regression models and their validation statistics—
specifically, the multiple correlation coefficients and the F-ratio from an ANOVA. SPSS
software and many other statistical systems build in the step-wise process. Some are called
backward step-wise regression and some are called forward step-wiseregression. The
backward step-wise regression starts with all the independent variables placed in the model,
and the step-wise process removes them one at a time based on worst predictors first until a
statistically significant model emerges. The forward step-wise regression starts with the best
related variable (using correction analysis as a guide), and then step-wise adds other variables
until adding more will no longer improve the accuracy of the model. The forward step-wise
regression process will be illustrated here manually. The first step is to generate individual
regression models and statistics for each independent variable with the dependent variable one
at a time. These three models are presentedin Tables 6.5, 6.6, and 6.7 for the POS, radio, and
TV variables, respectively. The comparable Excel regression statistics are presented in Tables
6.8, 6.9 and 6.10 for the POS, radio, and TV variables, respectively.
Table 6.5 SPSS POS Regression Model: Marketing/Planning Case Study
Table 6.6 SPSS Radio Regression Model: Marketing/Planning Case Study

Table 6.7 SPSS TV Regression Model: Marketing/Planning Case Study


Table 6.8 Excel POS Regression Model: Marketing/Planning Case Study

Table 6.9 Excel Radio Regression Model: Marketing/Planning Case Study


DataMining,
It is a discovery- driven software application process that provides insights into business
databy finding hidden patterns and relationships in big or small data and inferring rules from them
to predict future behavior. These observed patterns and rules guide decision-making. This is not
just numbers, but text and social media information from the Web.

A Simple Illustration of Data Mining:


Suppose a grocery store has collected a big data file on what customers put into their baskets
at the market (the collection of grocery items a customer purchases at one time). The grocery
store would like to know if there are any associated items in a typical market basket. (For
example, if a customer purchases product A, she will most often associate it or purchase it with
product B.) If the customer generally purchases product A and B together, the store might only
need to advertise product A to gain both product A’s and B’s sales.

The value of knowing this association of productscan improve the performance of the store
by reducing the need to spend money on advertising both products. The benefit is real if the
association holds true. Finding the association and proving it to be valid requires some analysis.

From the descriptive analytics analysis, some possible associations may have been
uncovered, such as product A’s and B’s association. With any size data file, the normal
procedure in data mining would be to divide the file into two parts. One is referred to as a training
data set, and the other as avalidation data set. The training data set develops the association
rules, andthe validation data set tests and proves that the rules work. Starting with thetraining
data set, a common data mining methodology is what-if analysis using logic-based software.
Excel and SPSS both have what-if logic-based software applications, and so do a number of
other software vendors .These software applications allow logic expressions. (For example, if
product A is present, then is product B present?) The systems can also provide frequency and
probability information to show the strengthof the association. These software systems have
differing capabilities, which permit users to deterministically simulate different scenarios to
identify complex combinations of associations between product purchases in a market basket.
Once a collection of possible associations is identified and their probabilities are computed,
the same logic associations (now considered association rules) are reran using the validation
data set. A new set of probabilities can be computed, and those can be statistically compared
using hypothesis testing methods to determine their similarity. Other software systems compute
correlations for testing purposes to judge the strength and the direction of the relationship. In
other words, if the consumer buys product A first, it could be referred to as the Head and product
B as the Body of the association. If thesame basic probabilities are statistically significant, it
lends validity to the association rules and their use for predicting market basket item purchases
based on groupings of products.
Data Mining Methodologies,
Data mining is an ideal predictive analytics tool used in the BA process. Table 6.2 lists a
small sampling of data mining methodologies to acquire different types of information. Some of
the same tools used in the descriptive analytics step are used in the predictive step butare employed
to establish a model (either based on logical connections or quantitative formulas) that may be
useful in predicting the future.

Table 6.2 Types of Information and Data Mining Methodologies

Several computer-based methodologies listed in Table 6.2 are briefly introduced here. Neural
networks are used to find associations where connections between words or numbers can be
determined. Specifically, neural networks can take large volumes of data and potential variables
and explore variable associations to express a beginning variable (referred to as an input layer),
through middle layers of interacting variables, and finally toan ending variable (referred to as an
output). More than just identifying simple one-on-one associations, neural networks link multiple
association pathways through big data like a collection of nodes in a network. These nodal
relationships constitute a form of classifying groupings of variables as related to one another, but
even more, related in complex paths with multiple association. SPSS has two versions of neural
network software functions: Multilayer Perceptron (MLP) and Radial Basis Function (RBF). Both
procedures produce a predictive model for one or more dependent variables based on the values of
the predictive variables. Both allow a decision maker to develop, train, and use the software to
identify particular traits (such as bad loan risks for abank) based on characteristics from data
collected on past customers).

Discriminant analysis is similar to a multiple regression model except that it permits


continuous independent variables and a categorical dependent variable. The analysis generates a
regression function whereby values of the independent variables can be incorporated to generate
a predicted value for the dependent variable. Similarly, logistic regression is like multiple
regression. Like discriminant analysis, its dependent variable can be categorical. The independent
variables, though, in logistic regressioncan be either continuous or categorical.
Hierarchical clustering is a methodology that establishes a hierarchy of clusters that can be
grouped by the hierarchy. Two strategies are suggested for this methodology: agglomerative
and divisive. The agglomerative strategy is a bottom-up approach, where one starts with each
item in the data and begins to group them. The divisive strategy is a top-down approach, where
one starts with all the items in one group and divides the group into clusters.
K-mean clustering is a classification methodology that permits a set of data to be reclassified
into K groups, where K can be set as the number of groups desired. The algorithmic process
identifies initial candidates for the K groups and then interactively searches other candidates in
the data set to be averaged into a mean value that represents a particular K group.

Prescriptive analytics and its step in the business analytics Process,

(A) Case Study Background Review:

The case study firm had collected a random sample of monthly sales information presented
in Figure 6.4 listed in thousands of dollars. What the firm wants to know is, given a fixed budget
of $350,000 for promoting this service product, when offered again, how best should the
company allocate budget dollars in hopes of maximizing the future estimated month’s product
sales? Before making any allocation of budget, there is a need to understandhow to estimate
future product sales. This requires understanding the behavior of product sales relative to sales
promotion efforts using radio, paper, TV, and point-of-sale (POS) ads.
Figure 6.4 Data for marketing/planning case study

The analysis also revealed little regarding the relationship of newspaper and POS ads to
product sales. So although radio and TV commercials are most promising, amore in-depth
predictive analytics analysis is called for to accurately measure and document the degree of
relationship that may exist in the variables to determine the best predictors of product sales.

Prescriptive Modelling,

After undertaking the descriptive and predictive analytics steps in the BAprocess, one
should be positioned to undertake the final step: prescriptive analytics analysis. The prior
analysis should provide a forecast or predictionof what future trends in the business may
hold. For example, there may be significant statistical measures of increased (or decreased)
sales, profitability trends accurately measured in dollars for new market opportunities, or
measured cost savings from a future joint venture.
Step 3 of the BA process, prescriptive analytics, involves the application ofdecision
science, management science, or operations research methodologies to make best use of
allocable resources. These are mathematically based methodologies and algorithms
designed to take variables and other parameters into a quantitative framework and generate
an optimal or near-optimal solution to complex problems. These methodologies can be used
to optimally allocate a firm’s limited resources to take best advantage of the opportunities
it has found in the predicted future trends. Limits on human, technology, and financial
resources prevent any firm from going after all the opportunities. Using prescriptive
analyticsallows the firm to allocate limited resources to optimally or near-optimally achieve
the objectives as fully as possible.
The listing of the prescriptive analytic methodologies as they are in some cases utilized
in

the BA process is again presented in Figure 7.1 to form the basis of thischapter’s content.

Figure 7.1 Prescriptive analytic methodologies

Prescriptive Modeling:
The listing of prescriptive analytic methods and models in Figure 7.1 is but a small
grouping of many operations research, decision science, and management science
methodologies that are applied in this step of the BA process. The explanation and use of
most of the methodologies in Table 7.1are explained throughout this book. (See Additional
Information column inTable 7.1.)
nonlinear Optimization.

When business performance cost or profit functions becometoo complex for simple
linear models to be useful, exploration of nonlinear functions is a standard practice in BA.
Although the predictive nature of exploring for a mathematical expression to denote a trend
or establish a forecast falls mainly in the predictive analytics step of BA, the use of the
nonlinear function to optimize a decision can fall in the prescriptive analytics step.
there are many mathematical programing nonlinear methodologies and solution
procedures designed to generate optimal business performance solutions. Most of them
require careful estimation of parameters that may or may not be accurate, particularly given
the precision required of a solution that can be so precariously dependent upon parameter
accuracy. This precision is further complicated in BA by the large data files that should be
factored into the model-buildingeffort.
To overcome these limitations and be more inclusive in the use of largedata, regression
software can be applied. Curve Fitting software can be used to generate predictive
analytic modelsthat can also be utilized to aid in making prescriptive analytic decisions.
For purposes of illustration, SPSS’s Curve Fitting software will be used in this chapter.
Suppose that a resource allocation decision is being faced whereby one must decide how
many computer servers a service facility should purchase to optimize the firm’s costs of
running the facility. The firm’s predictive analytics effort has shown a growth trend. A new
facilityis called for if costs can be minimized. The firm has a history of setting uplarge and
small service facilities and has collected the 20 data points in Figure 7.2.

Figure 7.2 Data and SPSS Curve Fitting function selection window

In this server problem, the basic data has a u-shaped function, as


presented in Figure 7.3. This is a classic shape for most cost functions in business. In this
problem, it represents the balancing of having too few servers (resulting in a costly loss of
customer business through dissatisfaction and complaints with the service) or too many
servers (excessive waste in investment costs as a result of underutilized servers). Although
this is an overly simplified example with little and nicely ordereddata for clarity purposes,
in big data situations, cost functions are considerably less obvious.
Figure 7.3 Server problem basic data cost function

The first step in using the curve-fitting methodology is to generate the best-fitting
curve to the data. By selecting all the SPSS models in Figure 7.2, the software applies each
point of data using the regression process of minimizing distance from a line. The result is
a series of regression models and statistics, including ANOVA and other testing statistics.
It is known from the previous illustration of regression that the adjusted R-Square statistic
can reveal the best estimated relationship between the independent (number of servers) and
dependent (total cost) variables. These statistics arepresented in Table 7.2. The best adjusted
R-Square value (the largest) occurs with the quadratic model, followed by the cubic model.
The more detailed supporting statistics for both of these models are presented in Table
BELOW. The graph for all the SPSS curve-fitting models appears in Figure 7.4.
Table 7.2 Adjusted R-Square Values of All SPSS Models
Table 7.3 Quadratic and Cubic Model SPSS Statistics

Figure 7.4 Graph of all SPSS curve-fitting models

From Table 7.3, the resulting two statistically significant curve-fittedmodels follow:
Yp = 35417.772 − 5589.432 X + 268.445 X2 [Quadratic model]
Yp = 36133.696 − 5954.738 X + 310.895 X2 − 1.347 X3 [Cubic model]
Yp = the forecasted or predicted total cost, and
X = can be the number of computer servers.
Unit 4:
Forecasting Techniques: Qualitative and Judgmental Forecasting, Statistical Forecasting Models,
Forecasting Models for Stationary Time Series, Forecasting Models for Time Series with a Linear
Trend, Forecasting Time Series with Seasonality, Regression Forecasting with Casual Variables,
Selecting Appropriate Forecasting Models. Monte Carlo Simulation and Risk Analysis: Monte Carle
Simulation Using Analytic Solver Platform, New-Product Development Model, Newsvendor Model,
Overbooking Model, Cash Budget Model.
Qualitative and Judgmental Forecasting,

Qualitative and judgmental techniques rely on experience and intuition; they are
necessarywhen historical data are not available or when the decision maker needs to forecast
far into the future. Another use of judgmental methods is to incorporate nonquantitative
information, such as the impact of government regulations or competitor behavior, in a
quantitative forecast. Judgmental techniques range fromsuch simple methods as a manager’s
opinion or a group based jury of executive opinion to more structured approaches such as
historical analogy and the Delphi method.

(A) HISTORICAL ANALOGY:


One judgmental approach is historical analogy, in which a forecast is obtained
through a comparative analysis with a previous situation. For example, if a new
product is be- ing introduced, the response of consumers to marketing campaigns
to similar, previous products can be used as a basis to predict how the new
marketing campaign might fare.

(B) DELPHI METHOD:

A popular judgmental forecasting approach, called the Delphi method,


uses a panel of experts, whose identities are typically kept confidential from
one another, to respond to a sequence of questionnaires. After each round of
responses, individual opinions, edited to ensure anonymity, are shared,
allowing each to see what the other experts think. The Delphi method
promotes unbiased exchanges of ideas and discussion and usually results in
some convergence of opinion. It is one of the better approaches to forecasting
long- range trends and impacts.

Statistical Forecasting Models,

Statistical time-series models find greater applicability for short-range forecasting


prob- lems. A time series is a stream of historical data, such as weekly sales. We characterize
thevalues of a time series over T periods as At , t = 1, 2, c, T. Time-series models assume
that whatever forces have influenced sales in the recent past will continue into the near fu-ture;
thus, forecasts are developed by extrapolating these data into the future. Time series generally
have one or more of the following components: random behavior, trends, seasonal effects, or
cyclical effects. Time series that do not have trend, seasonal, or cyclical effects but are
relatively constant and exhibit only random behavior are called stationary time series.

Many forecasts are based on analysis of historical time-series data and are predicated
on the assumption that the future is an extrapolation of the past. A trend is a gradual upward
or downward movement of a time series over time.

Time series may also exhibit short-term seasonal effects (over a year, month, week, or
even a day) as well as longer-term cyclical effects, or nonlinear trends. A seasonal effect is one
that repeats at fixed intervals of time, typically a year, month, week, or day. At a neighborhood
grocery store, for instance, short-term seasonal patterns may occur over a week, with the
heaviest volume of customers on weekends; seasonal patterns may also be evident during the
course of a day, with higher volumes in the mornings and late afternoons. Figure 9.2 shows
seasonal changes in natural gas usage for a homeowner overthe course of a year (Excel file
Gas & Electric). Cyclical effects describe ups and downs over a much longer time frame, such
as several years. Figure 9.3 shows a chart of the data in the Excel file Federal Funds Rates.
We see some evidence of long-term cycles in the time series driven by economic factors, such
as periods of inflation and recession.

9.2. Seasonal Effects inNatural Gas Usage


9.3. Cyclical Effects in Federal Funds Rates

Although visual inspection of a time series to identify trends, seasonal, or cyclical


effects may work in a naïve fashion, such unscientific approaches may be a bit unsettling to a
manager making important decisions. Subtle effects and interactions of seasonal and cyclical
factors may not be evident from simple visual extrapolation of data. Statistical methods,
which involve more formal analyses of time series, are invaluable in developing good
forecasts. A variety of statistically-based forecasting methods for time series are commonly
used. Among the most popular are moving average methods, expo- nential smoothing, and
regression analysis. These can be implemented very easily on a spreadsheet using basic
functions and Data Analysis tools available in Microsoft Excel, aswell as with more powerful
software such as XLMiner. Moving average and exponential smoothing models work best for
time series that do not exhibit trends or seasonal factors. For time series that involve trends
and/or seasonal factors, other techniques have been developed. These include double moving
average and exponential smoothing models, seasonal additive and multiplicative models, and
Holt- Winters additive and multiplicativemodels.

Forecasting Models for Stationary Time Series:


Forecasting Models for Time Series with a Linear Trend,

For time series with a linear trend but no significant seasonal components, double
moving average and double exponential smoothing models are more appropriate than using
simple moving average or exponential smoothing models. Both methods are based on the linear
trend equation:

Ft + k = at + btk ................................................... (9.6)

That is, the forecast for k periods into the future from period t is a function of a base
valueat, also known as the level, and a trend, or slope, bt. Double moving average and double
exponential smoothing differ in how the data are used to arrive at appropriate values forat and
bt. Because the calculations are more complex than for simple moving average and exponential
smoothing models, it is easier to use forecasting software than to try to imple- ment the models
directly on a spreadsheet. Therefore, we do not discuss the theory or for-mulas underlying the
methods. XLMiner does not support a procedure for double moving average; however, it does
provide one for double exponential smoothing.

DOUBLE EXPONENTIAL SMOOTHING:

In double exponential smoothing the estimates of at & bt are obtained from following
equations:

at = aFt + 11 - a21at-1 + bt -12

bt = b1at - at-12 + 11 - b2bt-1 .................................................... 9.7

In essence, we are smoothing both parameters of the linear trend model. From the first
equation, the estimate of the level in period t is a weighted average of the observed value at
time t and the predicted value at time t, at-1 + bt-1, based on simple exponential smoothing.
For large values of a, more weight is placed on the observed value. Lower values of put more
weight on the smoothed predicted value. Similarly, from the second equation, the estimate of
the trend in period t is a weighted average of the differences in the estimated levels in periods
t and t - 1 and the estimate of the level in period t - 1.

Larger values of b place more weight on the differences in the levels, but lower values
of b put more emphasis on the previous estimate of the trend. Initial values are chosen for a1
as A1 and b1 as A2 - A1. Equations (9.7) must then be used to compute at and bt for the entire
time series to be able to generate forecasts into the future. As with simple exponential
smoothing, we are free to choose the values of a and b. However, it is easier to let XLMiner
optimize these values using historical data.

Forecasting Time Series with Seasonality,

Quite often, time-series data exhibit seasonality, especially on an annual basis. When
time series exhibit seasonality, different techniques provide better forecasts than other
techniques.

(A) REGRESSON BASED SEASONAL FORECASTING MODELS:

One approach is to use linear regression. Multiple linear regression models with
categorical variables can be used for time series with seasonality. To do this, we use
dummy categorical variables for the seasonal components.

(B) HOLT-WINTERS FORECASTING FOR SEASONAL FORECASTING:

based on the work of two researchers, C.C. Holt, who developed the basic approach,
and P.R. Winters, who extended Holt’s work. Hence, these approaches are commonly
referred to as Holt-Winters models. Holt-Winters models are similar to exponential
smoothing models in that smoothing con- stants are used to smooth out variations in the
level and seasonal patterns over time. For time series with seasonality but no trend,
XLMiner supports a Holt-Winters method but does not have the ability to optimize the
parameters.

HOLT-WINTERS MODELS FOR FORECASTING TIME SERIES WITH


SEASONALITY AND TREND:

Many time series exhibit both trend and seasonality. Such might be the case for growing
sales of a seasonal product. These models combine elements of both the trend and sea- sonal
models. Two types of Holt-Winters smoothing models are often used.

Holt-Winters additive model is based on the equation

Ft +1 = at + bt + St-s + 1 (9.8)

and the Holt-Winters multiplicative model is

Ft +1 = 1at + bt2St-s+1 (9.9)

The additive model applies to time series with relatively stable seasonality, whereas
the multiplicative model applies to time series whose amplitude increases or decreases over
time. Therefore, a chart of the time series should be viewed first to identify the appropriate
type of model to use. Three parameters, a, b, and g, are used to smooth the level, trend, and
seasonal factors in the time series. XLMiner supports both models.

Regression Forecasting with Casual Variables,

In many forecasting applications, other independent variables besides time, such as eco-
nomic indexes or demographic factors, may influence the time series. For example, a man-
ufacturer of hospital equipment might include such variables as hospital capital spending and
changes in the proportion of people over the age of 65 in building models to forecast future
sales. Explanatory/causal models, often called econometric models, seek to iden tify factors
that explain statistically the patterns observed in the variable being forecast, usually with
regression analysis. We will use a simple example of forecasting gasoline sales to illustrate
econometric modeling.

FORECASTING GASEOLINE SALES USING SIMPLE REGRESSION MODEL:

Figure 9.27 shows gasoline sales over 10 weeks during June through
August along with the average price per gal- lon and a chart of the
gasoline sales time series with a fitted trendline (Excel file
Gasoline Sales). During the sum- mer months, it is not unusual to
see an increase in salesas more people go on vacations. The chart
shows a linear trend, although R2 is not very high. The trendline is:
sales 4,790.1 + 812.99 week

Using this model, we would predict sales for week 11 assales 4,790.1 +
812.99(11) 13,733 gallons………

Gasoline Sales Data andTrendline

The gasoline sales data, we also see that the average price per gallon changes each
week, and this may influence consumer sales. Therefore, the sales trend might not simply be
a factor of steadily increasing demand, but it might also be influenced by the average price
per gallon. The average price per gallon can be considered as a causal variable. Multiple
linear regression provides a technique for building forecasting models that incor-porate not
only time, but other potential causal variables also.
INCORPORATING CASUAL VARIABLES IN REGRESSION FORECASTING
MODEL:

For the gasoline sales data, we can incorporate theprice/gallon by using two
independent variables. This results in the multiple regression model sales

B0 + B1 week + B2 price>gallon

The results are shown in Figure 9.28, and the regres- sion model is sales
72333.08 + 508.67 week - 16463.2 price>gallon
Notice that the R2 value is higher when both variables areincluded, explaining more
than 86% of the variation in the data. If the company estimates that the average price

for the next week will drop to $3.80, the model would forecast the sales for week 11
as sales 72333.08 + 508.67(11) - 16463.2(3.80) 15,368 gallons

FIG. 9.28 Regression Results for Gasoline Sales

Selecting Appropriate Forecasting Models.


Considerations for Selecting a Forecasting Model:
1. Data Characteristics:
 Understand the characteristics of your data, such as seasonality, trend, and the presence
of outliers.
2. Model Complexity:
 Choose a model that balances complexity with interpretability, especially when dealing
with limited data.
3. Accuracy and Performance:
 Evaluate the performance of different models using appropriate metrics and choose the
one that provides the most accurate forecasts.
4. Interpretability:
 Consider the interpretability of the model, especially if stakeholders need to understand
and trust the forecasting results.
5. Data Availability:
 Some models may require a large amount of data to perform well, while others can
handle smaller datasets.
6. Computational Resources:
 Assess the computational resources required for training and deploying the model,
especially for real-time forecasting applications.
7. Expertise and Resources:
 Consider the expertise available within the organization for implementing and
maintaining the chosen forecasting model.

Monte Carlo Simulation and Risk Analysis:


Monte Carle Simulation Using Analytic Solver Platform,

MONTE CARLO SIMULATION USING ANALYTIC SOLVERPLATFORM:

To use Analytic Solver Platform, you must perform the following steps:

(1) Develop the spreadsheet model.


(2) Determine the probability distributions that describe the uncertain inputs inyour
model.
(3) Identify the output variables that you wish to predict.
(4) Set the number of trials or repetitions for the simulation.
(5) Run the simulation.
(6) Interpret the results.

 DEFINING CERTAIN MODEL INPUTS:


When model inputs are uncertain, we need to characterize them by some
probability distribution. For many decision models, empirical data may be available, either
in historical records or collected through special efforts. For example, maintenance records
might provide data on machine failure rates and repair times, or observers might collect
data on service times in a bank or post office. This provides a factual basis for choosing
the appropriate probability distribution to model the input variable.
There are two ways to define uncertain variables in Analytic Solver Platform. One
is to use the custom Excel functions for generating random samples from probability
distri- butions The second way to define an uncertain variable is to use the Distributions
button in the Analytic Solver Platform ribbon. First, select the cell in the spreadsheet for
which youwant to define a distribution. Click on the Distributions button as shown in
Figure 12.3. Choose a distribution from one of the categories in the list that pops up. This
will display adialog in which you may define the parameters of the distribution.

12.3. Analytic Solver PlatformDistributions Options

 DEFINING OUTPUT CELLS:

To define a cell you wish to predict and create a distribution of output values
from your model (which Analytic Solver Platform calls an uncertain function cell),
first select it, and then click on the Results button in the Simulation Model group in the
Analytic Solver Platform ribbon. Choose the Output option and then In Cell.

 RUNNING A SIMULATION:

To run a simulation, first click on the Options button in the Options group in the Analytic Solver
Platform ribbon. This displays a dialog (see Figure 12.7) in which you can specify the number of
trials and other options to run the simulation (make sure the Simulation tab is selected). Trials per
Simulation allows you to choose the number of times that Analytic Solver Platform will generate
random values for the uncertain cells in the model and recalculate the entire spreadsheet. Because
Monte Carlo simulation is essentially sta- tistical sampling, the larger the number of trials you use, the
more precise will be the result.
Unless the model is extremely complex, a large number of trials will not undulytax today’s
computers, so we recommend that you use at least 5,000 trials (the educationalversion restricts
this to a maximum of 10,000 trials). You should use a larger number of trials as the number of
uncertain cells in your model increases so that the simulation can generate representative
samples from all distributions for assumptions. You may run morethan one simulation if you
wish to examine the variability in the results.
FIG. 12.7. Analytic Solver Platform Options Dialog

The procedure that Analytic Solver Platform uses generates a stream of


random num- bers from which the values of the uncertain inputs are selected from
their probability same assumption values. As long as you use the same number, the
assumptions generatedwill be the same for all simulations.

Analytic Solver Platform has alternative sampling methods; the two most
common are Monte Carlo and Latin Hypercube sampling. Monte Carlo sampling
selects random variates independently over the entire range of possible values of the
distribution. With Latin Hypercube sampling, the uncertain variable’s probability
distribution is divided into intervals of equal probability and generates a value
randomly within each interval. Latin Hypercube sampling results in a more even
distribution of output values because it samples the entire range of the distribution in a
more consistent manner, thus achiev- ing more accurate forecast
statistics (particularly the mean) for a fixed number of Monte Carlo trials. However,
Monte Carlo sampling is more representative of reality and should be used if you are
interested in evaluating the model performance under various what-if scenarios. Unless
you are an advanced user, we recommend leaving the other options at their default
values.

The last step is to run the simulation by clicking the Simulate button in the Solve
Action group. When the simulation finishes, you will see a message “Simulation
finished successfully” in the lower-left corner of the Excel window.

 VIEWING & ANALYZING RESULTS:

You may specify whether you want output charts to automatically appear after
a simulation is run by clicking the Options button in the Analytic Solver Platform
ribbon, and either checking or unchecking the box Show charts after simulation in the
Charts tab. You may also view the results of the simulation at any time by double-
clicking on an output cell that contains the PsiOutput() function or by choosing
Simulation from the Reports button in the Analysis group in the Analytic Solver
Platform ribbon. This displays a win- dow with various tabs showing different charts
to analyze results.

New-Product Development Model


New-Product Development (NPD) models provide structured frameworks to guide organizations
through the process of bringing a new product or service to market. These models help ensure that the
development process is systematic, efficient, and aligned with the organization's goals.
Newsvendor Model,
The Newsvendor Model, also known as the Newsboy Model or Inventory Management Model, is a
mathematical model used in inventory and supply chain management to determine the optimal order
quantity for perishable or seasonal goods. The model helps organizations strike a balance between the
costs associated with carrying excess inventory and the costs of potential stockouts. It is named after the
analogy of a newspaper vendor trying to determine the optimal number of newspapers to order for daily
sales.
### Key Concepts of the Newsvendor Model:

1. **Demand Distribution:**
- The demand for the product is assumed to follow a probability distribution. The actual demand is
uncertain and can vary.
2. **Order Quantity (Q):**
- The decision variable is the order quantity, representing the number of units that the retailer orders to
meet customer demand.
3. **Unit Cost and Selling Price:**
- The retailer incurs a cost (c) per unit of the product ordered. The selling price (p) per unit is usually
higher than the unit cost.
4. **Salvage Value (V):**
- If the retailer orders more units than demanded, the excess units may have a lower salvage value (V)
or disposal cost. Salvage value represents the revenue generated from selling excess units, returning
unsold units to the supplier, or other disposal methods.
5. **Shortage Cost (h):**
- If the retailer orders fewer units than demanded, there is a shortage cost (h) associated with the lost
sales, backordering, or other costs related to unmet demand.

### Mathematical Formulation:


The Newsvendor Model seeks to find the order quantity (Q) that minimizes the total cost, considering
the trade-off between excess inventory costs and shortage costs. The optimal order quantity (Q*) can be
determined using the critical ratio (CR), which is the ratio of the shortage cost to the sum of shortage
and excess cost:

### Key Considerations:


- The Newsvendor Model assumes that the demand distribution is known, and it operates under the
assumption of a single-period decision context.
- The critical ratio helps determine the portion of demand that should be satisfied to minimize total
costs.
- The model is particularly useful for industries with perishable or seasonal products, such as fashion
apparel, fresh produce, or holiday-related items.
- The Newsvendor Model provides insights into the trade-offs between ordering more or fewer units and
helps optimize inventory decisions based on cost considerations.
Overbooking Model,
An Overbooking Model is a strategy employed by service providers (such as airlines, hotels, and event
organizers) to maximize revenue by deliberately accepting more reservations or bookings than the
available capacity. The idea behind overbooking is based on the statistical probability that not all
customers who make reservations will actually show up. This strategy aims to optimize resource
utilization and mitigate potential revenue losses due to no-shows.

### Key Concepts of Overbooking Model:


1. **Reservation Demand:**
- Overbooking models rely on historical data and analysis to estimate the probability distribution of
the number of customers who will actually use their reservations (show-up rate). This information is
essential for determining how many additional reservations can be accepted.
2. **No-Show Rate:**
- The no-show rate represents the percentage of customers who make reservations but do not show up
for the service. This rate is a critical parameter in overbooking models.
3. **Overbooking Limit:**
- The overbooking limit is the maximum number of additional reservations accepted beyond the
available capacity. This limit is determined by considering the no-show rate and the desired level of
risk.
4. **Revenue Management:**
- The primary objective of overbooking is to maximize revenue by strategically accepting more
reservations than the actual capacity. Revenue management algorithms are often used to dynamically
adjust overbooking limits based on factors such as demand patterns, historical data, and time remaining
until the service date.
5. **Cancellation and Reaccommodation Policies:**
- Overbooking strategies are often accompanied by flexible cancellation policies and plans for
reaccommodating customers in case the actual demand exceeds the available capacity. This may involve
providing compensation, alternative services, or rescheduling options.
### Mathematical Formulation:
The overbooking decision involves a trade-off between the revenue gained from accepting additional
reservations and the potential costs associated with accommodating excess demand or compensating
customers for denied service. A basic formulation might look like this:

### Key Considerations:

1. **Accuracy of Demand Forecasts:**


- Accurate estimation of the no-show rate and demand patterns is crucial for the success of an
overbooking strategy.
2. **Risk Tolerance:**
- Determining the appropriate level of risk is essential. Accepting too many reservations may result in
service disruptions and customer dissatisfaction, while accepting too few may lead to lost revenue
opportunities.
3. **Dynamic Adjustments:**
- Overbooking limits may need to be dynamically adjusted based on real-time information, changes in
demand patterns, and other factors.
4. **Customer Communication:**
- Clear communication with customers is vital. Providing transparent information about overbooking
policies, compensation procedures, and alternative options helps manage customer expectations.
5. **Legal and Ethical Considerations:**
- Overbooking practices should comply with legal regulations and ethical standards. There should be
fair and transparent processes for handling denied boarding situations.

While overbooking can be a profitable strategy when executed judiciously, it requires careful planning
and continuous monitoring to ensure that it aligns with customer expectations and business goals.
Cash Budget Model.
A Cash Budget Model is a financial planning tool that helps organizations forecast and manage their
cash inflows and outflows over a specific period, typically on a monthly or quarterly basis. The primary
goal of a cash budget is to ensure that a business has sufficient liquidity to meet its operational needs,
repay debts, and invest in growth opportunities. This model is crucial for effective cash flow
management and helps businesses avoid liquidity issues.

### Components of a Cash Budget Model:


1. **Cash Receipts:**
- Identify and estimate all sources of cash inflows. This includes revenues from sales, loans,
investments, and any other sources contributing to cash on hand.
2. **Cash Disbursements:**
- List and estimate all anticipated cash outflows. This includes payments for operating expenses,
interest on loans, taxes, salaries, and any other payments that reduce cash.
3. **Opening Cash Balance:**
- Begin with the opening cash balance at the beginning of the budget period. This is the cash available
from the previous period.
4. **Closing Cash Balance:**
- Calculate the closing cash balance by adjusting the opening balance with the net cash inflows and
outflows for the period.
5. **Net Cash Flow:**
- Determine the net cash flow for each period by subtracting total cash disbursements from total cash
receipts.
6. **Borrowings and Repayments:**
- If needed, incorporate any planned borrowings or repayments of loans into the cash budget.

### Mathematical Formulation:

### Steps in Developing a Cash Budget:


1. **Sales Forecast:**
- Begin with a sales forecast, estimating the expected revenue from sales during the budget period.
2. **Collections from Sales:**
- Estimate the timing of cash collections from sales. This involves considering the credit terms offered
to customers, collection periods, and any delayed payments.

3. **Operating Expenses:**
- Identify and estimate all operating expenses, such as rent, utilities, salaries, and other costs that
require cash payments.
4. **Loan Payments and Interest:**
- Include any loan payments and interest expenses in the cash budget.
5. **Other Cash Inflows and Outflows:**
- Consider any additional cash inflows or outflows, such as investments, asset purchases, or other
financial activities.
6. **Opening and Closing Balances:**
- Determine the opening cash balance for the period, calculate the net cash flow, and compute the
closing cash balance.
7. **Monitoring and Adjusting:**
- Regularly monitor actual cash flows against the budget and make adjustments as needed. This may
involve revising revenue or expense estimates based on changing circumstances.

### Benefits of a Cash Budget Model:


1. **Liquidity Management:**
- Helps organizations manage their liquidity by ensuring that they have enough cash to cover day-to-
day operations and unexpected expenses.
2. **Financial Planning:**
- Facilitates effective financial planning by providing a forward-looking view of cash flows, helping
organizations anticipate and address potential shortfalls.
3. **Debt Management:**
- Assists in managing debt obligations by ensuring that there is sufficient cash to meet loan repayment
schedules and interest payments.
4. **Decision-Making:**
- Provides a basis for strategic decision-making, such as assessing the feasibility of investments,
expansion plans, or timing for major expenditures.
5. **Risk Management:**
- Helps organizations identify and manage financial risks associated with cash flow, allowing for
proactive measures to address potential issues.
A well-constructed cash budget is a valuable tool for businesses of all sizes, providing insights into their
financial health and guiding effective cash flow management strategies. Regular updates and
adjustments based on actual performance contribute to the ongoing financial health of the organization.
Unit 5:
Decision Analysis: Formulating Decision Problems, Decision Strategies with the without Outcome
Probabilities, Decision Trees, The Value of Information, Utility and Decision Making.
Recent Trends in : Embedded and collaborative business intelligence,Visual data recovery, Data
Storytelling and Data journalism
Decision Analysis:
Formulating Decision Problems,

Many decisions involve a choice from among a small set of alternatives with uncertain
consequences. We may formulate such decision problems by defining three things:
1. the decision alternatives that can be chosen,
2. the uncertain events that may occur after a decision is made along with theirpossible
outcomes, and
3. the consequences associated with each decision and outcome, which are usu- ally
expressed as payoffs.
The outcomes associated with uncertain events (which are often called states of nature),
are defined so that one and only one of them will occur. They may be quantitative or qualitative.
For instance, in selecting the size of a new factory, the future demand for the product would be
an uncertain event. The demand outcomes might be expressed quantita-tively in sales units or
dollars. On the other hand, suppose that you are planning a spring- break vacation to Florida in
January; you might define an uncertain event as the weather; these outcomes might be
characterized qualitatively: sunny and warm, sunny and cold, rainy and warm, rainy and cold,
and so on. A payoff is a measure of the value of making a decision and having a particular
outcome occur. This might be a simple estimate made judgmentally or a value computed from
a complex spreadsheet model. Payoffs are often summarized in a payoff table, a matrix whose
rows correspond to decisions and whose columns correspond to events. The decision maker
first selects a decision alternative, afterwhich one of the outcomes of the uncertain event occurs,
resulting in the payoff.

Decision Strategies with the without Outcome Probabilities,

DECISION STRATEGIES FOR A MINIMIZE OBJECTIVE:

Aggressive (Optimistic) Strategy An aggressive decision maker might seek the option
that holds the promise of minimizing the potential loss. For a minimization objective, this
strategy is also often called a minimin strategy; that is, we choose the decision that minimizes
the minimum payoff that can occur among all outcomes for each decision. Aggressive decision
makers are often called speculators, particularly in financialarenas, because they increase their
exposure to risk in hopes of increasing their return; while a few may be lucky, most will not do
very well.

Conservative (Pessimistic) Strategy A conservative decision maker, on the other


hand, might take a more-pessimistic attitude and ask, “What is the worst thing thatmight
result from my decision?” and then select the decision that represents the “bestof the worst.”
Such a strategy is also known as a minimax strategy because we seekthe decision that
minimizes the largest payoff that can occur among all outcomes for each decision.
Conservative decision makers are willing to forgo high returns to avoid undesirable losses.
This rule typically models the rational behavior of most individuals.

Opportunity-Loss Strategy A third approach that underlies decision choices for many
individuals is to consider the opportunity loss associated with a decision. Opportunity loss
represents the “regret” that people often feel after making a nonoptimal decision (I shouldhave
bought that stock years ago!). In general, the opportunity loss associated with any decision and
event is the absolute difference between the best decision for that particular outcome and the
payoff for the decision that was chosen. Opportunity losses can beonly nonnegative values.
If you get a negative number, then you made a mistake. Once opportunity losses are computed,
the decision strategy is similar to a conservative strategy.The decision maker would select the
decision that minimizes the largest opportunity loss among all outcomes for each decision. For
these reasons, this is also called a minimax regret strategy.

DECISION STRATEGIES FOR MAXIMIZE OBJECTIVE:

When the objective is to maximize the payoff, we can still apply aggressive, conservative,
and opportunity loss strategies, but we must make some key changes in the analysis.
(1) For the aggressive strategy, the best payoff for each decision would be the largest
value among all outcomes, and we would choose the decision corresponding to the
largest of these, called a maximax strategy.
(2) For the conservative strategy, the worst payoff for each decision would be the
smallest value among all outcomes, and we would choose the decision
corresponding to the largest of these, called a maximin strategy.
(3) For the opportunity-loss strategy, we need to be careful in calculating the
opportunity losses. With a maximize objective, the decision with the largest value
for a particular event has an opportunity loss of zero. The opportunitylosses
associated with other decisions is the absolute difference between their payoff and
the largest value. The actual decision is the same as when payoffs are costs:
Choose the decision that minimizes the maximum opportunityloss.

DECISION WITH CONFLICTING OBJECTIVES:

Many decisions require some type of tradeoff among conflicting objectives, such as risk
versus reward. A simple decision rule can be used whenever one wishes to make an optimal
tradeoffbetween any two conflicting objectives, one of which is good, and one of which is bad,
that maximizes the ratio of the good objective to the bad. First, display the tradeoffs on a
chart with the “good” objective on the x-axis, and the “bad” objective on the y-axis, making
sure to scale the axes properly todisplay the origin (0,0). Then graph the tangent line to the
tradeoff curve that goes throughthe origin. The point at which the tangent line touches the
curve (which represents the smallest slope) represents the best returnto risk tradeoff.
TABLE 16.1. Summary of Decision Strategies Under Uncertainty

Strateg Aggressi Conservati


y/ ve ve Strategy Opportunity-Loss Strategy
Objecti Strategy
ve
Minimiz Find the Find the For each outcome, compute
e smallest largest payoff the opportunity loss for
objectiv payoff for for each each decision as the
e each decision decision absolute difference between
among all among all its payoff and the small- est
outcomes, and outcomes, and payoff for that outcome.
choose the choose the Find the maximum
decision with decision with the opportunity loss for each
the smallest of smallest of these decision, and choose the
these (minimax). decision with the smallest
(minimum). oppor-
tunity loss (minimax regret).
Maximize Choose the Find the largest Find the smallest For each outcome, compute
objective decision payoff for payoff for the opportunity loss for
with the each decision each decision each decision as the
largest among all among all absolute difference
average outcomes, and outcomes, and between its payoff and the
payoff. choose the choose the largest payoff for that
decision with decision with the outcome. Find the
the largest of largest of these maximum opportunity
these (maximin). loss for each deci- sion,
(maximax). and choose the decision
with the smallest
opportunity
loss (minimax regret).
Table 16.1 summarizes the decision rules for both minimize and maximize objectives.
DECISION STRATEGIES WITH OUTCOME PROBABILITIES:

The aggressive, conservative, and opportunity-loss strategies assume no knowledge of


theprobabilities associated with future outcomes.

AVERAGE PAY-OFF STRATEGY:


If we can assess a probability for each outcome, we can choose the best
decision based onthe expected value using concepts. For any decision, the expected
value is the summation of the payoffs multiplied by their probability, summed overall
outcomes. The simplest case is to assume that each outcome is equally likely to occur;
that is, the probability of each outcome is simply 1/N, where N is the number of
possible outcomes. This is called the average payoff strategy.

EXPECTED VALUE STRATEGY:


A more general case of the average payoff strategy is when the probabilities of the
out- comes are not all the same. This is called the expected value strategy.

EVALUATING RISK:
An implicit assumption in using the average payoff or expected value strategy is that
the decision is repeated a large number of times.

Decision Trees,

A useful approach to structuring a decision problem involving uncertainty is to use a


graphical model called a decision tree. Decision trees consist of a set of nodes and branches.
Nodes are points in time at which events take place. The event can be a selection of a decision
from among several alternatives, represented by a decision node, or an outcome over which
the decision maker has no control, an event node.Event nodes are conventionally depicted
by circles, and decision nodes are expressedby squares. Branches are associated with
decisions and events. Many decision makersfind decision trees useful because sequences of
decisions and outcomes over time canbe modeled easily.

Decision trees may be created in Excel using Analytic Solver Platform. Click the
Decision Tree button. To add a node, select Add Node from the Node drop down list, as shown
in Figure 16.2. Click on the radio button for the type of node you wish to create (decision or
event). This displays one of the dialogs shown in Figure 16.3. For a decision node, enter
the name of the node and names of the branches that emanate fromthe node (you may also
add additional ones). The Value field can be used to input cash flows, costs, or revenues that
result from choosing a particular branch. For an event node, enter the name of the node and
branches. The Chance field allows you to enter the probabilities of the events.

FIG. 16.2. Decision Tree Menuin Analytic Solver Platform

FIG. 16.3. Decision Tree Dialogs for Decisions and Events

DECISION TREES AND MONTE CARLO SIMULATION:


Because all computations use Excel formulas, you could easily perform what-if
analysis or create data tables to analyze changes in the assumptions of the model. One of
the interesting features of decision trees in Analytic Solver Platform is that you canalso
use the Excel model to develop a Monte Carlo simulation or an optimization model using
the decision tree.

DECISION TREES AND RISK:


The decision tree approach is an example of expected value decision making. Thus, in
thedrug-development example, if the company’s portfolio of drug-development projects has
similar characteristics, then pursuing further development is justified on an expected value
basis. However, this approach does not explicitly consider risk.
Each decision strategy has an associated payoff distribution, called a risk
profile.Risk profiles show the possible payoff values that can occur and their probabilities

SENSITIVE ANALYSIS IN DECISION TREES:

We may use Excel data tables to investigate the sensitivity of the optimal decision to
changes inprobabilities or payoff values.

The Value of Information,

When we deal with uncertain outcomes, it is logical to try to obtain better information
about their likelihood of occurrence before making a decision. The value of information
represents the improvement in the expected return that can be achieved if the decision maker
is able to acquire before making a decision additional information about the future event that
will take place. In the ideal case, we would like to have perfect information, which tells us
with certainty what outcome will occur. Although this will never occur, it is useful to know the
value of perfect information because it provides an upper bound on the value of any
information that we may acquire. The expected value of perfect information (EVPI) is the
expected value with perfect information (assumed at no cost) minus the expected value
without any information; again, it represents the most you should be willing to pay for perfect
information.

The expected opportunity loss represents the average additional amount the decision
maker would have achieved by making the right decision instead of a wrong one. To find the
expected opportunity loss, we create an opportunity-loss table, as discussed earlier in this
chapter, and then find the expected value for each decision. It will always be true thatthe
decision having the best expected value will also have the minimum expected opportunity loss.
The minimum expected opportunity loss is the EVPI.

DECISIONS WITH SAMPLE INFORMATION:

Sample information is the result of conducting some type of experiment, such as


a market research study or interviewing an expert. Sample information is always
imperfect. Often, sample information comes at a cost. Thus, it is useful to know
how muchwe should be willing to pay for it. The expected value of sample
information (EVSI) is the expected value with sample information (assumed at no
cost) minus the expected valuewithout sample information; it represents the most
you should be willing to pay for the sample information.
BAYE’S RULE:

Bayes’s rule extends the concept of conditional probability to revise


historical probabilities based on sample information. Suppose that A1, A2,…, Ak is
a set of mutually exclusive andcollectively exhaustive events, and we seek the
probability that some event Ai occurs giventhat another event B has occurred.
Bayes’s rule is stated as follows:

Utility and Decision Making.

An approach for assessing risk attitudes quantitatively is called utility


theory. This approach quantifies a decision maker’s relative preferences for
particular outcomes. We can determine an individual’s utility function by posing a
series of decision scenarios.

CONSTRUCTING A UTILITY FUNCTION:


A utility function may be used instead of the actual monetary payoffs in a
decision analysis by simply replacing the payoffs with their equivalent utilities and
then computingexpected values. The expected utilities and the corresponding
optimal decision strategy then reflect the decision maker’s preferences toward risk.
For example, if we use the aver-age payoff strategy (because no probabilities of
events are given) for the data in Table 16.2,the best decision would be to choose the
stock fund. However, if we replace the payoffs inTable 16.2 with the (risk- averse)
utilities that we defined and again use the average payoffstrategy, the best decision
would be to choose the bank CD as opposed to the stock fund, as shown in the
following table.

Decision/Eve Rates Rates Rates Average


nt Rise Stable Fall Utility
Bank CD 0.75 0.75 0.75 0.75
Bond fund 0.35 0.85 0.9 0.70
Stock fund 0 0.80 1.0 0.60

EXPONENTIAL UTILITY FUNCTION:

If assessments of event probabilities are available, these can be used to compute the
expected utility and identify the best decision.It can be rather difficult to compute a
utility function, especially for situations involving a large number of payoffs.
Because most decision makers typically are risk averse, we mayuse an exponential
utility function to approximate the true utility function.
The exponentialutility function is

U1x2 = 1 - e-x>R (16.2)


where e is the base of the natural logarithm (2.71828 …) and
R is a shape parameter that isa measure of risk tolerance.

Figure 16.14 shows several examples of U(x) for different val-ues of R. Notice that all these functions
are concave and that as R increases, the functions become flatter, indicating more tendency toward
risk neutrality.
One approach to estimating a reasonable value of R is to find the maximum payoff
$R for which the decision maker is willing to take an equal chance on winning $R or los-
ing
$R>2. The smaller the value of R, the more risk averse is the individual. For
instance, would you take a bet on winning $10 versus losing $5? How about
winning $10,000 ver- sus losing
$5,000? Most people probably would not worry about taking the first gamble but
might definitely think twice about the second. Finding one’s maximum comfort
level establishes theutility function.
Fig. 16.14. Examples of ExponentialUtility Functions

Recent Trends in :
Embedded and collaborative business intelligence,
As of my last knowledge update in January 2022, trends in embedded and collaborative
business intelligence (BI) have been evolving to meet the growing demands for data-driven
decision-making and enhanced user experiences. Keep in mind that developments in the field
may have occurred since then. Here are some trends that were relevant at the time:
1. **Embedded Analytics and BI:**
- **Integration with Applications:** There is a growing trend of embedding analytics
directly into other business applications. This integration allows users to access analytics and
BI tools seamlessly within the applications they already use, promoting a more unified user
experience.
- **Customization for Specific Use Cases:** Businesses are customizing embedded
analytics to fit specific use cases and industry needs. This involves tailoring analytics
solutions to the unique requirements of different user groups and verticals.
2. **Collaborative BI:**
- **Social Collaboration Features:** Collaborative BI platforms incorporate social
collaboration features, enabling users to share insights, annotations, and comments within the
BI environment. This fosters teamwork, knowledge sharing, and more informed decision-
making.
- **Real-Time Collaboration:** There is a shift towards real-time collaboration, allowing
multiple users to work on and interact with BI content simultaneously. This trend supports
dynamic discussions and collaborative analysis.
- **Integration with Communication Tools:** Integration with communication tools,
such as messaging apps and collaboration platforms, enhances the flow of information and
insights among team members.
3. **Self-Service BI:**
- **Empowering Non-Technical Users:** The emphasis on self-service BI continues,
with a focus on empowering non-technical users to create their own reports, dashboards, and
visualizations. This trend reduces dependence on IT teams for routine analytics tasks.
- **User-Friendly Interfaces:** BI tools are becoming more user-friendly with intuitive
interfaces, drag-and-drop functionality, and natural language processing. This facilitates
easier adoption and usage by business users.
4. **Mobile BI:**
- **Mobile-First Approach:** With the increasing use of smartphones and tablets, BI
vendors are adopting a mobile-first approach. Mobile BI enables users to access and interact
with analytics on the go, providing flexibility and ensuring that decision-makers are not tied
to their desks.
- **Responsive Design:** BI tools are incorporating responsive design principles to ensure
a consistent and user-friendly experience across various devices, screen sizes, and
orientations.
5. **AI and Machine Learning Integration:**
- **Automated Insights:** AI and machine learning capabilities are being integrated into
BI tools to automate insights generation. This helps users discover patterns, trends, and
anomalies without explicitly querying the data.
- **Predictive Analytics:** Predictive analytics, powered by machine learning algorithms,
is becoming more prevalent in BI. Businesses are using these capabilities to anticipate future
trends and make proactive decisions.
6. **Data Governance and Security:**
- **Focus on Data Governance:** As data privacy regulations become more stringent,
there is an increased focus on data governance within BI platforms. This includes features for
data lineage, data quality monitoring, and access controls to ensure compliance.
- **Embedded Security Measures:** Security features are being embedded directly into
BI solutions to protect sensitive information. This includes encryption, authentication, and
authorization mechanisms.
7. **Cloud-Based BI:**
- **Rise of Cloud-Based Solutions:** Cloud-based BI solutions are gaining popularity
due to their scalability, flexibility, and ease of implementation. Organizations are adopting
cloud-based BI to leverage the advantages of cloud infrastructure.
- **Hybrid Deployments:** Some businesses are opting for hybrid BI deployments,
combining on-premises and cloud solutions to meet specific performance, security, or
compliance requirements.
8. **Natural Language Processing (NLP):**
- **Querying with Natural Language:** NLP capabilities enable users to interact with BI
tools using natural language queries. This simplifies the process of data exploration and
analysis, making BI more accessible to a broader audience.
9. **Continuous Analytics:**
- **Real-Time Analytics:** The demand for real-time analytics is increasing, particularly
in industries where immediate insights are crucial. Continuous analytics enables
organizations to monitor and analyze data streams in real-time, leading to more timely
decision-making.
10. **Integration with Big Data Technologies:**
- **Handling Large Datasets:** As organizations deal with larger and more complex
datasets, BI tools are integrating with big data technologies to efficiently process and analyze
vast amounts of data.
- **Support for Data Variety:** BI platforms are evolving to handle diverse data types,
including structured, semi-structured, and unstructured data, enabling a more comprehensive
view of business information.

These trends collectively reflect a shift towards more user-centric, collaborative, and
intelligent BI solutions. Keep in mind that the field of embedded and collaborative BI is
dynamic, and ongoing technological advancements will likely influence the trajectory of
these trends.
Visual data recovery,
As of my last knowledge update in January 2022, trends in visual data recovery have been
shaped by advancements in computer vision, machine learning, and image processing
technologies. It's important to note that developments in this field may have occurred since
then. Here are some trends that were relevant at the time:

1. **Deep Learning for Image Restoration:**


- **Generative Adversarial Networks (GANs):** Deep learning, particularly GANs, has
been increasingly used for image restoration and recovery. GANs can generate high-quality
and realistic images, making them valuable for tasks like image denoising, super-resolution,
and inpainting.
2. **AI-Based Image Inpainting:**
- **Context-Aware Algorithms:** AI-driven inpainting algorithms have become more
sophisticated, taking contextual information into account for filling in missing parts of
images. These algorithms can generate more realistic and contextually accurate visual
recoveries.
3. **Enhanced Super-Resolution Techniques:**
- **Single Image Super-Resolution (SISR):** Advances in super-resolution techniques
for visual recovery have been notable. SISR methods leverage deep learning to enhance the
resolution of images, allowing for the recovery of finer details.
4. **Video Frame Interpolation:**
- **Motion-Aware Algorithms:** In video processing, frame interpolation techniques
have improved, enabling the generation of intermediate frames to enhance the visual quality
of videos. These algorithms consider motion patterns to create smooth transitions between
frames.
5. **Explainable AI in Image Recovery:**
- **Interpretable Models:** There is a growing emphasis on making visual data recovery
models more interpretable and explainable. Understanding how models arrive at certain
visual recovery decisions is crucial for building trust and ensuring accountability.
6. **Real-Time Image and Video Recovery:**
- **Efficiency and Speed:** Recent trends include a focus on developing visual recovery
solutions that operate in real-time. This is particularly important for applications where quick
decision-making or feedback is essential.
7. **Multi-Modal Visual Recovery:**
- **Integration of Different Modalities:** Visual recovery techniques are increasingly
integrating information from multiple modalities (e.g., RGB images, depth maps, infrared).
This fusion enhances the ability to recover visual information in diverse environments and
conditions.
8. **Edge and IoT Applications:**
- **On-Device Processing:** There is a trend towards deploying visual recovery models
on edge devices and within the Internet of Things (IoT) ecosystem. On-device processing
reduces latency and enhances privacy by minimizing the need for transmitting sensitive
visual data.
9. **Domain-Specific Visual Recovery:**
- **Customized Solutions:** Visual recovery methods are becoming more specialized for
particular domains such as medical imaging, satellite imagery, and surveillance. Customized
solutions take into account the unique characteristics and requirements of each domain.
10. **Privacy-Preserving Techniques:**
- **Privacy-Aware Algorithms:** Privacy concerns have led to the development of
visual recovery techniques that can operate while preserving the privacy of individuals in the
data. This is crucial for applications in surveillance and healthcare.
11. **Crowdsourced Visual Data Recovery:**
- **Human-in-the-Loop Approaches:** Some visual recovery applications are
incorporating crowdsourcing or human-in-the-loop approaches. Humans provide input or
validation to enhance the accuracy and quality of visual recovery results.
12. **Cross-Modal Learning:**
- **Learning Across Modalities:** Techniques that involve learning from different
modalities (e.g., using textual descriptions to aid image recovery) are gaining attention for
their potential to improve visual recovery results.

It's important to stay updated with the latest research and industry advancements in visual
data recovery, as this field is dynamic, and new trends may emerge over time.
Data Storytelling and Data journalism
As of my last knowledge update in January 2022, data storytelling and data journalism have
continued to evolve with advancements in technology and changing data consumption
patterns. Here are some trends that were relevant at the time:
1. **Interactive Data Visualizations:**
- **User Engagement:** Data storytelling increasingly involves interactive data
visualizations. These visuals allow users to explore data on their own, enhancing engagement
and understanding. Tools like D3.js, Tableau, and Power BI facilitate the creation of
interactive dashboards.

2. **Augmented Reality (AR) and Virtual Reality (VR):**


- **Immersive Experiences:** AR and VR technologies are being explored for creating
immersive data storytelling experiences. These technologies offer new ways to present and
interact with data, providing a more engaging and memorable experience.
3. **Explainable AI in Data Storytelling:**
- **Interpretable Models:** With the increased use of AI in data analysis, there's a
growing emphasis on making the underlying models more interpretable. This is crucial for
data journalists and storytellers to explain complex insights in a comprehensible manner.
4. **Natural Language Generation (NLG):**
- **Automated Narrative Creation:** NLG tools are being used to automatically
generate narratives from structured data. This trend aids in producing written explanations of
data trends and patterns, saving time for data journalists and storytellers.
5. **Data Collaboration Platforms:**
- **Collaborative Environments:** Platforms that facilitate collaboration among data
professionals, journalists, and decision-makers are gaining popularity. These environments
allow multiple contributors to work together on data-driven stories in real-time.
6. **Data Literacy Initiatives:**
- **Promoting Data Literacy:** There's an increased focus on promoting data literacy
among journalists and the general public. Training programs and initiatives aim to empower
individuals to understand and communicate with data effectively.
7. **Data Ethics in Reporting:**
- **Ethical Considerations:** Data journalists are paying more attention to the ethical
implications of their work. This includes ensuring the responsible use of data, avoiding bias,
and transparently communicating the limitations of datasets.
8. **Podcasts and Multimedia Storytelling:**
- **Audio Narratives:** Podcasts and other audio formats are being used for data
storytelling. This trend caters to audiences who prefer consuming information through audio
channels, offering an alternative to traditional visual formats.
9. **Social Media Integration:**
- **Storytelling on Social Platforms:** Data storytelling is increasingly integrated into
social media platforms. Short-form content, infographics, and data-driven stories are crafted
for platforms like Instagram, Twitter, and LinkedIn to reach wider audiences.
10. **Live Data Reporting:**
- **Real-Time Reporting:** Some data journalism initiatives involve live reporting and
continuous updates as events unfold. This real-time approach allows journalists to provide
up-to-the-minute insights and analysis.
11. **Cross-Collaboration with Developers:**
- **Collaboration with Tech Teams:** Data journalists are collaborating more closely
with developers and data scientists. This collaboration enhances the technical capabilities of
data storytelling projects and results in more sophisticated and interactive visualizations.
12. **Data-Driven Newsrooms:**
- **Integration of Data Teams:** Newsrooms are increasingly integrating data teams
into their structures. Data professionals work alongside traditional journalists, contributing to
investigative reporting and enriching stories with data-driven insights.
13. **Focus on Local Data Stories:**
- **Hyperlocal Reporting:** Data journalism is placing a greater emphasis on local and
community-level stories. This trend involves using data to tell stories that directly impact
specific regions or communities.
14. **Transparency and Fact-Checking:**
- **Fact-Checking Initiatives:** Given the prevalence of misinformation, data
journalism is increasingly focusing on fact-checking and promoting transparency in data
sources. Building trust with the audience is a key consideration.
15. **Personalization in Data Stories:**
- **Tailored Content:** Data stories are becoming more personalized, catering to the
specific interests and needs of different audience segments. This trend involves using data to
create content that resonates with diverse audiences.
It's important to stay updated with the latest developments in data storytelling and data
journalism, as these fields continue to evolve with emerging technologies and changing
audience preferences.

You might also like