BA Material
BA Material
Business Analytics (BA) is the practice of using data analysis and statistical methods to derive
insights and make informed business decisions.
It involves the use of various tools, techniques, and technologies to collect, process, and analyze
data, with the ultimate goal of providing valuable information to support organizational decision-
making.
1. **Descriptive Analytics:** This involves the use of historical data to understand and describe
what has happened in the past. It includes techniques such as data visualization, reporting, and
summarization to provide a clear picture of historical trends and patterns.
2. **Predictive Analytics:** Predictive analytics involves the use of statistical algorithms and
machine learning techniques to identify patterns and make predictions about future events. This can
help organizations anticipate trends, forecast demand, and make proactive decisions.
3. **Prescriptive Analytics:** This form of analytics goes beyond predicting future outcomes and
provides recommendations on what actions to take to optimize a given situation. It involves the use
of optimization and simulation techniques to determine the best course of action.
Business Analytics is applied across various functional areas within an organization, including
finance, marketing, operations, human resources, and more. It plays a crucial role in gaining a
competitive advantage, improving operational efficiency, identifying new business opportunities,
and mitigating risks.
Key components of Business Analytics include data collection, data cleaning and preprocessing,
exploratory data analysis, modeling, and interpretation of results. Businesses often use specialized
tools and platforms, such as data visualization tools, statistical software, and business intelligence
platforms, to carry out these tasks.
Scope of Business analytics:
The scope of Business Analytics is expansive and continues to evolve as technology advances and
organizations recognize the value of data-driven decision-making. Here are some key aspects that
define the scope of Business Analytics:
Categorical Data Data that is grouped by one or more characteristics. Categorical data usually
involves cardinal numbers counted or expressed as percentages. Example 1:
Product markets that can be characterized by categories of “high-end”
products or “low-income” products, based on dollar sales. It is common to use
this term to apply to data sets that contain items identified by categories as
well as observations summarized in cross-tabulations or contingency tables.
Ordinal Data Data that is ranked or ordered to show relational preference. Example 1:
Football team rankings not based on points scored but on wins. Example 2:
Ranking of business firms based on product quality.
Interval Data Data that is arranged along a scale where each value is equally distant from
others. It is ordinal data. Example 1: A temperature gauge. Example 2: A
survey instrument using a Likert scale (that is, 1, 2, 3, 4, 5, 6, 7), where 1 to 2
is perceived as equidistant to the interval from 2 to 3, and so on. Note: In
ordinal data, the ranking of firms might vary greatly from first place to
second, but in interval data, they would have to be relationally proportional.
Ratio Data Data expressed as a ratio on a continuous scale. Example 1: The ratio of firms
with green manufacturing programs is twice that of firms without such a
program.
From Step 1 in the Descriptive Analytic analysis (see Figure 1.1), some patterns or variables of
business behavior should be identified representing targets of business opportunities and possible
(but not yet defined) future trend behavior. Additional effort (more mining) might be required, such
as the generation of detailed statistical reports narrowly focused on the data related to targets of
business opportunities to explain what is taking place in the data (what happened in the past). This
is like a statistical search for predictive variables in data that may lead to patterns of behavior a firm
might take advantage of if the patterns of behavior occur in the future. For example, a firm might
find in its general sales information that during economic downtimes, certain products are sold to
customers of a particular income level if certain advertising is undertaken. The sales, customers,
and advertising variables may be in the form of any of the measurable scales for data in Table 1.4,
but they have to meet the three conditions of BA previously mentioned: clear relevancy to business,
an implementable resulting insight, and performance and value measurement capabilities.
To determine whether observed trends and behavior found in the relationships of the descriptive
analysis of Step 1 actually exist or hold true and can be used to forecast or predict the future, more
advanced analysis is undertaken in Step 2, Predictive Analytic analysis, of the BA process. There
are many methods that can be used in this step of the BA process. A commonly used methodology
is multiple regression. (See Appendix A, “Statistical Tools,” and Appendix E, “Forecasting,” for a
discussion on multiple regression and ANOVA testing.) This methodology is ideal for establishing
whether a statistical relationship exists between the predictive variables found in the descriptive
analysis. The relationship might be to show that a dependent variable is predictively associated with
business value or performance of some kind. For example, a firm might want to determine which of
several promotion efforts (independent variables measured and represented in the model by dollars
in TV ads, radio ads, personal selling, and/or magazine ads) is most efficient in generating customer
sale dollars (the dependent variable and a measure of business performance). Care would have to be
taken to ensure the multiple regression model was used in a valid and reliable way, which is why
ANOVA and other statistical confirmatory analyses are used to support the model development.
Exploring a database using advanced statistical procedures to verify and confirm the best predictive
variables is an important part of this step in the BA process. This answers the questions of what is
currently happening and why it happened between the variables in the model.
A single or multiple regression model can often forecast a trend line into the future. When
regression is not practical, other forecasting methods (exponential smoothing, smoothing averages)
can be applied as predictive analytics to develop needed forecasts of business trends. (See
Appendix E.) The identification of future trends is the main output of Step 2 and the predictive
analytics used to find them. This helps answer the question of what will happen.
If a firm knows where the future lies by forecasting trends as they would in Step 2 of the BA
process, it can then take advantage of any possible opportunities predicted in that future state. In
Step 3, Prescriptive Analytics analysis, operations research methodologies can be used to optimally
allocate a firm’s limited resources to take best advantage of the opportunities it found in the
predicted future trends. Limits on human, technology, and financial resources prevent any firm
from going after all opportunities they may have available at any one time. Using prescriptive
analytics allows the firm to allocate limited resources to optimally achieve objectives as fully as
possible. For example, linear programming (a constrained optimization methodology) has been
used to maximize the profit in the design of supply chains (Paksoy et al., 2013). (Note: Linear
programming and other optimization methods are presented in Appendixes B, “Linear
Programming,” C, “Duality and Sensitivity Analysis in Linear Programming,” and D, “Integer
Programming.”) This third step in the BA process answers the question of how best to allocate and
manage decision-making in the future.
Relationship of Business Analytics Process and organisation,
The BA process can solve problems and identify opportunities to improve business performance. In
the process, organizations may also determine strategies to guide operations and help achieve
competitive advantages. Typically, solving problems and identifying strategic opportunities to
follow are organization decision-making tasks. The latter, identifying opportunities, can be viewed
as a problem of strategy choice requiring a solution. It should come as no surprise that the BA
process described in Section 1.2 closely parallels classic organization decision-making processes.
As depicted in Figure 1.2, the business analytic process has an inherent relationship to the steps in
typical organization decision-making processes.
These notations are widely used in statistical literature, research papers, and educational materials
to represent statistical concepts and calculations concisely and consistently. Understanding and
using statistical notation is crucial for effective communication in the field of statistics.
Descriptive Statistical methods,
Descriptive statistics are methods used to summarize and describe the main features of a dataset.
These methods provide a way to organize and simplify large amounts of data, making it more
understandable. Here are some common descriptive statistical methods:
1. **Measures of Central Tendency:**
- **Mean (Average):** The sum of all values divided by the number of values in the dataset.
- **Median:** The middle value in a dataset when it is ordered. It is less sensitive to extreme
values than the mean.
- **Standard Deviation:** The square root of the variance, providing a measure of the average
distance of data points from the mean.
3. **Frequency Distributions:**
- **Frequency:** The number of times a particular value occurs in a dataset.
- **Relative Frequency:** The proportion of times a value occurs relative to the total number of
observations.
- **Histograms:** A graphical representation of the distribution of a dataset, showing the
frequency of different values.
4. **Percentiles and Quartiles:**
- **Percentiles:** Values below which a given percentage of data falls. The 50th percentile is the
median.
- **Quartiles:** Values that divide a dataset into four equal parts. The first quartile (Q1) is the
25th percentile, the second quartile (Q2) is the median, and the third quartile (Q3) is the 75th
percentile.
5. **Skewness and Kurtosis:**
- **Skewness:** Measures the asymmetry of a distribution. A skewness of 0 indicates a perfectly
symmetrical distribution.
- **Kurtosis:** Measures the "tailedness" of a distribution. Positive kurtosis indicates heavier
tails, and negative kurtosis indicates lighter tails compared to a normal distribution.
6. **Correlation Coefficient:**
- **Pearson Correlation Coefficient (r):** Measures the strength and direction of a linear
relationship between two variables. It ranges from -1 (perfect negative correlation) to 1 (perfect
positive correlation).
7. **Central Limit Theorem:**
- **Central Limit Theorem (CLT):** States that, for a large enough sample size, the distribution
of the sample mean will be approximately normally distributed, regardless of the distribution of the
original population.
These descriptive statistical methods provide valuable insights into the characteristics of a dataset,
helping researchers and analysts summarize and interpret data effectively. They are essential for
understanding the basic properties of data before moving on to more advanced statistical analyses.
Review of probability distribution and data modelling
**Review of Probability Distribution:**
**Definition:**
A probability distribution describes how the values of a random variable are spread or distributed
across different outcomes. It provides the likelihood of each possible outcome in a sample space.
**Key Concepts:**
1. **Discrete Probability Distribution:**
- Describes the probabilities associated with discrete random variables. The probabilities are
assigned to individual values.
2. **Continuous Probability Distribution:**
- Describes the probabilities associated with continuous random variables. Instead of individual
values, probabilities are assigned to ranges of values.
3. **Probability Mass Function (PMF):**
- For discrete random variables, the probability mass function gives the probability of each
possible value. It is often denoted as P(X = x).
4. **Probability Density Function (PDF):**
- For continuous random variables, the probability density function gives the probability density
at a given point. The probability of an event occurring within a given range is found by integrating
the PDF over that range.
5. **Cumulative Distribution Function (CDF):**
- The cumulative distribution function gives the probability that a random variable takes a value
less than or equal to a given value. It is denoted as F(x) for both discrete and continuous random
variables.
6. **Expected Value (Mean):**
- Represents the average of a random variable's possible values, weighted by their probabilities.
variable, it is
7. **Variance and Standard Deviation:**
- Variance measures the spread of values around the mean. For a discrete variable X,
In both probability distribution and data modeling, the key is to create accurate and meaningful
representations of the underlying processes. Probability distributions describe uncertainty, while
data modeling provides a framework for understanding and predicting real-world phenomena based
on observed data. Together, they form the foundation for statistical analysis and decision-making in
various fields.
Sampling is the process of selecting a subset of elements from a larger population to make
inferences about the entire population. It is a crucial step in statistical analysis and research, as it is
often impractical or impossible to study an entire population. Different sampling methods are used,
depending on the research goals and characteristics of the population. Here are some common
sampling methods:
Estimation involves using sample data to make inferences or predictions about population
parameters. Two main types of estimation are point estimation and interval estimation.
1. **Point Estimation:**
- Provides a single, specific value as an estimate for the population parameter. The sample mean
(\(\bar{X}\)) is a common point estimator for the population mean (\(\mu\)).
2. **Interval Estimation:**
- Provides a range of values within which the population parameter is likely to fall. It involves
constructing confidence intervals. The margin of error is a key component of interval estimation.
3. **Confidence Intervals:**
- A range of values constructed around a point estimate, providing a level of confidence that the
true population parameter falls within that range. Common confidence levels include 90%, 95%,
and 99%.
4. **Margin of Error:**
- The range above and below a point estimate within which the true parameter value is likely to
fall. It is influenced by the confidence level and variability in the sample.
5. **Hypothesis Testing:**
- While not a traditional estimation method, hypothesis testing is closely related. It involves
making a decision about a population parameter based on sample data and a null hypothesis. The
outcome of a hypothesis test can inform estimation.
6. **Maximum Likelihood Estimation (MLE):**
- A method for estimating the parameters of a statistical model. It seeks the parameter values that
maximize the likelihood function, representing the probability of observing the given sample.
7. **Bayesian Estimation:**
- Involves updating probability estimates based on prior knowledge and new evidence. It
combines prior beliefs (prior distribution) with the likelihood of observed data to obtain a posterior
distribution.
Sampling and estimation methods are fundamental components of statistical analysis. Careful
consideration of the sampling method and appropriate use of estimation techniques contribute to the
validity and reliability of research findings.
Unit 2:
Trendiness and Regression Analysis: Modelling Relationships and Trends in Data, simple Linear
Regression. Important Resources, Business Analytics Personnel, Data and models for Business
analytics, problem solving, Visualizing and Exploring Data, Business Analytics Technology
Modelling Relationships and Trends in Data
Modeling relationships and trends in data is a fundamental aspect of statistical analysis and data
science. It involves developing mathematical or statistical representations that capture the
underlying patterns and structures in the data. Here are several methods and techniques commonly
used for modeling relationships and trends:
1. **Linear Regression:**
- **Purpose:** Modeling a linear relationship between an independent variable (or multiple
variables) and a dependent variable.
- **Equation:** (y = mx + b), where \(y\) is the dependent variable, \(x\) is the independent
variable, \(m\) is the slope, and \(b\) is the y-intercept.
2. **Polynomial Regression:**
- **Purpose:** Extending linear regression to model relationships with higher degrees. Useful
when a curve is a better fit for the data.
- **Equation:** y=f(x)=β0+β1x+β2x2+β3x3+… +βdxd+ + β d x d + ϵ
3. **Exponential and Logarithmic Models:**
- **Exponential Model:** Describes exponential growth or decay.
- **Equation:** \(y = ab^x\), where \(a\) is the initial value, \(b\) is the growth or decay factor,
and \(x\) is the independent variable.
- **Logarithmic Model:** Useful for data that exhibits logarithmic trends.
- **Equation:** \(y = a + b \log(x)\).
4. **Time Series Models:**
- **Purpose:** Modeling trends, seasonality, and cyclic patterns in time-ordered data.
- **Methods:** Autoregressive Integrated Moving Average (ARIMA), Seasonal Decomposition
of Time Series (STL), and more.
5. **Nonlinear Least Squares:**
- **Purpose:** Fitting a model to data where the relationship is not explicitly defined.
- **Methods:** Minimizing the sum of the squares of the differences between observed and
predicted values.
6. **Splines and Piecewise Regression:**
- **Purpose:** Capturing complex trends by fitting multiple simpler models to different
segments of the data.
- **Methods:** Piecewise linear regression or using spline functions.
7. **Generalized Additive Models (GAMs):**
- **Purpose:** Extending linear models to incorporate smooth functions of predictors. Useful
for capturing non-linear relationships.
- **Equation:** \(y = \beta_0 + f_1(x_1) + f_2(x_2) + \ldots + f_k(x_k) + \epsilon\), where
\(f_i(x_i)\) are smooth functions.
8. **Machine Learning Models:**
- **Purpose:** Predicting outcomes based on complex relationships and patterns in the data.
- **Methods:** Decision Trees, Random Forests, Support Vector Machines, Neural Networks,
etc.
9. **Bayesian Modeling:**
- **Purpose:** Incorporating prior knowledge into the modeling process.
- **Methods:** Bayesian Linear Regression, Bayesian Neural Networks, etc.
10. **Quantile Regression:**
- **Purpose:** Modeling relationships at different quantiles of the data distribution.
- **Equation:** \(Q_\tau(y) = X\beta_\tau\), where \(Q_\tau(y)\) is the \(\tau\)-th quantile of
\(y\).
When selecting a modeling approach, it's essential to consider the nature of the data, the assumed
relationship, and the interpretability of the model. It's also crucial to validate models using
techniques like cross-validation and assess model performance against relevant metrics. The choice
of modeling technique depends on the specific characteristics of the dataset and the goals of the
analysis.
simple Linear Regression.
**Simple Linear Regression: Overview**
Simple Linear Regression is a statistical method used to model the relationship between a single
independent variable (\(X\)) and a dependent variable (\(Y\)) by fitting a linear equation to the
observed data. The goal is to find the best-fitting straight line that minimizes the difference between
the observed values and the values predicted by the line. The equation for a simple linear regression
model is represented as:
\[ Y = \beta_0 + \beta_1X + \varepsilon \]
Here's an overview of the key components:
1. **Variables:**
- \( Y \): Dependent variable (response or outcome).
- \( X \): Independent variable (predictor or feature).
- \( \beta_0 \): Y-intercept, representing the value of \( Y \) when \( X \) is 0.
- \( \beta_1 \): Slope of the line, indicating the change in \( Y \) for a one-unit change in \( X \).
- \( \varepsilon \): Error term, accounting for unobserved factors affecting \( Y \) that are not
explained by the linear relationship with \( X \).
2. **Objective:**
- Minimize the sum of squared differences between the observed \( Y \) values and the values
predicted by the linear equation.
In summary, simple linear regression is a powerful tool for understanding and quantifying the
relationship between two variables. It provides a straightforward approach to modeling and making
predictions based on observed data. However, its applicability is limited to situations where a linear
relationship between the variables is reasonable.
IMPORTANT RESOURCES:
it is necessary to understand resource needs of a BA program to better comprehend the
value of the information that BA provides. The need for BA resources varies by firm to meet
particular decision support requirements. Some firms may choose to have a modest investment,
whereas other firms may have BA teams or a department of BA specialists. Regardless of the level
of resource investment, at minimum, a BA program requires resource investments in BA personnel,
data, and technology.
(1) Business Analytics Personnel
(2) Business analytics technology
(3) Business Analytics Data
Structured and unstructured data is needed to generate analytics. As a beginning for
organizing data into an understandable framework, statisticians usually categorize data into
meaning groups.
Table 3.4 Typical Internal Sources of Data on Which Business AnalyticsCan Be Based
Table 3.5 Typical External Sources of Data on Which Business AnalyticsCan Be Based
There are many ways to categorize business analytics data. Data is commonly
categorized by either internal or external sources.Typical examples of internal data
sources include those presented in TABLE 3.4. When firms try to solve internal
production or service operations problems, internally sourced data may be all that is
needed. Typical external sources of data (SEE TABLE 3.5) are numerous and provide
great diversity and unique challenges for BA to process. Data can be measured
quantitatively (for example, sales dollars) or qualitatively by preference surveys (for
example, products compared based on consumers preferring one product over
another) or by the amount of consumer discussion (chatter) on the Web regarding the
pluses and minuses of competing products
A major portion of the external data sources are found in the literature.For example,
the US Census and the International Monetary Fund (IMF) are useful data sources at
the macroeconomic level for model building.
For example, having missing data in files can prohibit some forms’ statistical
modeling, and incorrect coding of information can completely render databases useless.
Data quality requires effort on the part of data managers to cleanse data of erroneous
information and repair or replace missing data.
refers to the protection of shared data such that access is permitted only to
those users for whom it is intended. It is a security issue that requires balancing the
need to know with the risks of sharing too much.
One way to identify personnel needed for BA staff is to examine what isrequired
for certification in BA by organizations that provide BA services.INFORMS, a major
academic and professional organization, announcedthe startup of a Certified Analytic
Professional (CAP) program in 2013.
Another more established organization, Cognizure, offers a variety of service
products, including business analytic services. It offers a general certification Business
Analytics Professional (BAP) exam that measures existing skill sets in BA staff and
identifies areas needing improvement This is a tool to validate technical proficiency,
expertise, and professional standards in BA. The certificationconsists of three exams
covering the content areas listed in Table 3.1.
Most of the content areas in Table 3.1 will be discussed and illustrated in subsequent
chapters and appendixes. The three exams required in the Cognizure certification program
can easily be understood in the context ofthe three steps of the BA process (descriptive,
predictive, and prescriptive.
The topics in Figure 3.1 of the certification program are applicable to the three
major steps in the BA process. The basic statistical tools apply to the descriptive analytics
step, the more advanced statistical tools apply to the predictive analytics step, and the
operations research tools apply to the prescriptive analytics step. Some of the tools can be
applied to both the descriptive and the predictive steps.
Likewise, tools like simulation can be applied to answer questions in both the
predictive and the prescriptive steps, depending on how they’re used. Atthe conjunction of
all the tools is the reality of case studies. The use of case studies is designed to provide
practical experience where all tools are employed to answer important questions or seek
opportunities.
Figure 3.1 Certification content areas and their relationship to the steps in BA
they also include specialized skill sets related to BA personnel (administrators, designers,
developers,solution experts, and specialists), as presented in Table 3.2.
Table 3.2 Types of BA Personnel
With the variety of positions and roles participants play in the BA process, this leads to
the question of what skill sets or competencies are needed to function in BA. In a general
sense, BA positions require competencies in business, analytic, and information systems
skills. As listedin Table 3.3, business skills involve basic management of people and
processes. BA personnel must communicate with BA staffers within the organization (the BA
team members) and the other functional areas within afirm (BA customers and users) to be
useful. Because they serve a variety of functional areas within a firm, BA personnel need to
possess customer service skills so they can interact with the firm’s personnel and understand
the nature of the problems they seek to solve. BA personnel also need to sell their services to
users inside the firm. In addition, some must lead a BAteam or department, which requires
considerable interpersonal managementleadership skills and abilities.
Table 3.3 Select Types of BA Personnel Skills or Competency Requirements
Fundamental to BA is an understanding of analytic methodologies listed in Table 3.1 and
others not listed. In addition to any tool sets, there is a needto know how they are integrated
into the BA process to leverage data (structured or unstructured) and obtain information that
customers who willbe guided by the analytics desire.
In summary, the combination of data and models in business analytics enables organizations to
gain valuable insights, optimize processes, and make data-driven decisions for improved
performance and competitiveness.
Visualizing and Exploring Data,
Visualizing and exploring data are essential steps in the data analysis process. These activities
help uncover patterns, trends, and relationships within the data, making it easier to derive
meaningful insights. Here are various techniques and tools for visualizing and exploring data:
### 1. **Descriptive Statistics:**
- Use summary statistics (mean, median, standard deviation) to understand the central
tendency and variability of the data.
- Identify outliers and anomalies that may require further investigation.
### 2. **Histograms:**
- Create histograms to visualize the distribution of a single variable.
- Understand the frequency and density of data points within different ranges.
### 3. **Box Plots (Box-and-Whisker Plots):**
- Display the distribution of a dataset and identify outliers.
- Show quartiles, median, and potential skewness.
### 4. **Scatter Plots:**
- Explore relationships between two continuous variables.
- Identify patterns, correlations, and potential outliers.
### 5. **Pair Plots:**
- Visualize pairwise relationships between multiple variables.
- Helpful for identifying patterns and correlations in multivariate datasets.
### 6. **Correlation Heatmaps:**
- Display correlation coefficients between variables using color intensity.
- Quickly identify strong positive or negative correlations.
### 7. **Line Charts:**
- Show trends in data over time or across a continuous variable.
- Useful for time-series data and continuous variables.
### 8. **Bar Charts:**
- Display the distribution of a categorical variable.
- Compare the frequency or proportion of different categories.
### 9. **Pie Charts:**
- Illustrate the proportion of each category in a whole.
- Useful for displaying parts of a whole (percentages).
### 10. **Area Charts:**
- Show the cumulative contribution of different variables over time.
- Effective for visualizing trends and patterns in cumulative data.
### 11. **Violin Plots:**
- Combine aspects of box plots and kernel density plots to display the distribution of data.
- Useful for comparing distributions across categories.
### 12. **Word Clouds:**
- Visualize word frequency in textual data.
- Words are displayed with sizes proportional to their frequencies.
### 13. **Geospatial Maps:**
- Use maps to visualize data with a geographic component.
- Display data points or aggregated values on a map.
### 14. **Interactive Dashboards:**
- Create interactive dashboards using tools like Tableau, Power BI, or Plotly.
- Allow users to explore data dynamically by adjusting parameters.
### 15. **3D Plots:**
- Visualize relationships in three-dimensional space.
- Useful when exploring interactions between three variables.
### 16. **Parallel Coordinates Plots:**
- Display multivariate data by representing each observation as a line.
- Useful for visualizing relationships between multiple variables.
### 17. **Network Graphs:**
- Visualize relationships between entities in a network.
- Nodes represent entities, and edges represent connections.
### 18. **Time Series Decomposition:**
- Decompose time-series data into trend, seasonality, and residual components.
- Understand the underlying patterns in time-dependent data.
### 19. **Distribution Plots (e.g., KDE Plots):**
- Visualize the probability distribution of a continuous variable.
- Kernel Density Estimation (KDE) plots provide a smooth estimate of the distribution.
### 20. **Treemaps:**
- Represent hierarchical data structures using nested rectangles.
- Visualize the proportion of each category within a hierarchy.
### Tools for Data Visualization:
- **Matplotlib:** A popular Python library for creating static, animated, and interactive
visualizations.
- **Seaborn:** Built on top of Matplotlib, Seaborn provides a high-level interface for
statistical data visualization.
- **Plotly:** Offers interactive and dynamic visualizations, including charts and dashboards.
- **Tableau:** A powerful data visualization tool that allows users to create interactive and
shareable dashboards.
- **Power BI:** A business analytics service by Microsoft for creating interactive reports
and dashboards.
### Best Practices:
- Choose visualizations based on the nature of your data and the insights you want to convey.
- Label axes, provide legends, and add annotations for clarity.
- Consider the audience and purpose of the visualization.
- Iterate and refine visualizations based on feedback and insights gained during exploration.
Remember that effective data visualization is not only about creating aesthetically pleasing
charts but also about conveying information in a clear and insightful manner.
Data warehouses are databases that store current and historical data of potential interest
to decision makers. What a data warehouse does is make data available to anyone who
needs access to it. In a data warehouse, the data is prohibited from being altered. Data
warehouses also provide a set ofquery tools, analytical tools, and graphical reporting
facilities. Some firms use intranet portals to make data warehouse information widely
available throughout a firm.
Data marts are focused subsets or smaller groupings within a data warehouse.
Firms often build enterprise-wide data warehouses where a central data warehouse
serves the entire organization and smaller, decentralized data warehouses (called data
marts) are focused on a limitedportion of the organization’s data that is placed in a
separate database for aspecific population of users. For example, a firm might develop
a smaller database on just product quality to focus efforts on quality customer and
product issues. A data mart can be constructed more quickly and at lower cost than
enterprise-wide data warehouses to concentrate effort in areas ofgreatest concern.
Online analytical processing (OLAP) is software that allows users to view data in
multiple dimensions. For example, employees can be viewed in terms of their age, sex,
geographic location, and so on. OLAP would allow identification of the number of
employees who are age 35, male, and in thewestern region of a country. OLAP allows
users to obtain online answers toad hoc questions quickly, even when the data is stored in
very large databases.
Data mining is the application of a software, discovery-driven process that provides
insights into business data by finding hidden patterns and relationships in big data or
large databases and inferring rules from them topredict future behavior. The observed
patterns and rules are used to guide decision-making. They can also act to forecast the
impact of those decisions.
Text mining is a software application used toextract key elements from unstructured
data sets, discover patterns and relationships in the text materials, and summarize the
information.
Web mining seeks to find patterns, trends, and insights into customer behavior from
users of the Web.
Analysis ToolPak isan Excel add-in that contains a variety of statistical tools (for example,
graphics and multiple regression) for the descriptive and predictive BA process steps. Another
Excel add-in, Solver, contains operations research optimization tools (for example, linear
programming) used in the prescriptive step of the BA Proces.
Table 3.8 Types of Information Obtainable with Data Mining Technology
Unit 3:
Organization Structures of Business analytics, Team management, Management Issues, Designing
Information Policy, Outsourcing, Ensuring Data Quality, Measuring contribution of Business
analytics, Managing Changes. Descriptive Analytics, predictive analytics, predicative Modelling,
Predictive analytics analysis, DataMining, Data Mining Methodologies, Prescriptive analytics and
its step in the business analytics Process, Prescriptive Modelling, nonlinear Optimization.
Organization Structures of Business analytics
to successfully implement business analytics (BA) within organizations, the BA in
whatever organizational form it takes must be fully integrated throughout afirm. This requires BA
resources to be aligned in a way that permits a viewof customer information within and across all
departments, access to customer information from multiple sources (internal and external to the
organization), access to historical analytics from a central repository, and making technology
resources align to be accountable for analytic success. The commonality of these requirements is
the desire for an alignment that maximizes the flow of information into and through the BA
operation, which in turn processes and shares information to desired users throughout the
organization.
(A) most organizations are hierarchical, with senior managers making the strategic planning
decisions, middle-level managers making tactical planning decisions, and lower-level managers
making operational planningdecisions. Within the hierarchy, other organizational structures
exist to support the development and existence of groupings of resources like thoseneeded for
BA. These additional structures include programs, projects, andteams. A program in this context
is the process that seeks to create an outcome and usually involves managing several related
projects with the intention of improving organizational performance. A program can also bea
large project. A project tends to deliver outcomes and can be defined as having temporary rather
than permanent social systems within or across organizations to accomplish particular and
clearly defined tasks, usually under time constraints. Projects are often composed of teams. A
team consists of a group of people with skills to achieve a common purpose.Teams are especially
appropriate for conducting complex tasks that havemany interdependent subtasks.
The relationship of programs, projects, and teams with a business hierarchy is presented in
Figure 4.1. Within this hierarchy, the organization’s senior managers establish a BA program
initiative to mandatethe creation of a BA grouping within the firm as a strategic goal. A BA
program does not always have an end-time limit. Middle-level managers reorganize or break
down the strategic BA program goals into doable BA project initiatives to be undertaken in a
fixed period of time. Some firms have only one project (establish a BA grouping) and others,
depending on the organization structure, have multiple BA projects requiring the creation of
multiple BA groupings. Projects usually have an end-time date in which to judge the
successfulness of the project. The projects in some cases are further reorganized into smaller
assignments, called BA team initiatives, to operationalize the broader strategy of the BA
program. BA teams may have a long-standing time limit (for example, to exist as the main source
of analytics for an entire organization) or have a fixed period (for example, to work on a specific
product quality problem and then end).
BA organization structures usually begin with an initiative that recognizes the need to
use and develop some kind of program in analytics. Fortunately, most firms today recognize
this need. The question then becomes how to match the firm’s needs within the organization to
achieve its strategic, tactical, and operations objectives within resource limitations. Planning
the BA resource allocation within the organizational structure of afirm is a starting place for the
alignment of BA to best serve a firm’s needs.
Aligning the BA resources requires a determination of the amount of resources a firm wants
to invest. The outcome of the resource investment might identify only one individual to compute
analytics for a firm. Becauseof the varied skill sets in information systems, statistics, and operations
research methods, a more common beginning for a BA initiative is the creation of a BA team
organization structure possessing a variety of analytical and management skills.
(B) Another way of aligning BA resources within an organization is to use a project structure.
Most firms undertake projects, and some firms actually use a project structure for their entire
organization.
The needs of each firm for BA sometimes dictate positioning BA within existing organization
functional areas. Clearly, many alternative structures can house a BA grouping. For example,
because BA provides information to users, BA could be included in the functional area of
management information systems, with the chief information officer (CIO) acting as boththe
director of information systems (which includes database management) and the leader of the
BA grouping.
(C) found in large organizations aligns resources by project or product and is called a
matrix organization. As illustrated in Figure 4.3, this structure allows the VPs some indirect
control over their related specialists, which would include the BA specialists but also allows
direct control by the project or product manager. This, similar to the functional organizational
structure, does not promote the cross-department access suggested for a successful
implementation of a BA program.
Figure 4.3 Matrix organization structure
The literature suggests that the organizational structure that best aligns BA resources is one
in which a department, project, or team is formed in a staff structure where access to and from
the BA grouping of resources permits access to all areas within a firm, as illustrated in Figure
4.4 The dashed line indicates a staff (not line management) relationship. This centralized BA
organization structure minimizes investment costs by avoiding duplications found in both the
functional and the matrix styles of organization structures. At the same time, it maximizes
information flow between and across functional areas inthe organization. This is a logical
structure for a BA group in its advisory role to the organization.
Team management,
When it comes to getting the BA job done, it tends to fall to a BA team. For firms that employ
BA teams the participants can be defined by the rolesthey play in the team effort. Some of the
roles BA team participants undertake and their typical background are presented in Table 4.2.
Aligning BA teams to achieve their tasks requires collaboration efforts from team members
and from their organizations. Like BA teams, collaboration involves working with people to
achieve a shared and explicitset of goals consistent with their mission. BA teams also have a
specific mission to complete. Collaboration through teamwork is the means to accomplish their
mission.
Team members’ need for collaboration is motivated by changes in the nature of work (no
more silos to hide behind, much more open environment, and so on), growth in professions
(for example, interactive jobs tend to be more professional, requiring greater variety in
expertise sharing), and the need to nurture innovation (creativity and innovation are fostered
by collaboration with a variety of people sharing ideas). To keep one’s job and to progress in
any business career, particularly in BA, team members must encourage working with other
members inside a team and out.
Table 4.2 BA Team Participant Roles*
For organizations, collaboration is motivated by the changing nature ofinformation flow
(that is, hierarchical flows tend to be downward, whereas in modern organizations, flow is in
all directions) and changes in the scopeof business operations (that is, going from domestic
to global allows for a greater flow of ideas and information from multiple sources in multiple
locations).
Management Issues,
Managing Change:
Wells (2000) found that what is critical in changing organizations is organizational culture
and the use of change management. Organizational culture is how an organization supports
cooperation, coordination, and empowerment of employees . Change management is defined as
an approach for transitioning the organization (individuals, teams, projects, departments) to a
changed and desired future state .Change management is a means of implementing change in an
organization, such as adding a BA department .Changes in an organization can be either planned
(a result of specific and planned efforts at change withdirection by a change leader) or unplanned
(spontaneous changes without direction of a change leader).
The application of BA invariably will result inboth types of changes because of BA’s specific
problem-solving role (a desired, planned change to solve a problem) and opportunity finding
exploratory nature (i.e., unplanned new knowledge opportunity changes) of BA. Change
management can also target almost everything that makes up an organization (see Table 4.7).
Some of these activities that lead to change management success are presented as best
practices in Table 4.8.
Table 4.8 Change Management Best Practices
Descriptive Analytics,
Descriptive analytics involves analyzing and summarizing historical data to gain insights into
patterns, trends, and characteristics of a particular phenomenon. This type of analytics focuses on
understanding what has happened in the past. Below are some examples of descriptive analytics and
how they can be illustrated:
1. **Histograms:**
- **Description:** Histograms are graphical representations of the distribution of a dataset. They
display the frequency or probability of different values in a dataset.
- **Illustration:** A histogram can be created to show the distribution of sales revenue for a
specific product over the past year. The x-axis represents revenue ranges, and the y-axis represents
the frequency or count of occurrences in each range.
2. **Pie Charts:**
- **Description:** Pie charts are circular statistical graphics that are divided into slices to
illustrate numerical proportions.
- **Illustration:** A pie chart can be used to show the percentage distribution of sales across
different product categories. Each slice represents a product category, and the size of the slice
corresponds to its percentage share of the total sales.
3. **Line Charts:**
- **Description:** Line charts are used to represent data points over a continuous interval or time
span. They are commonly used to show trends over time.
- **Illustration:** A line chart can illustrate the monthly website traffic over the past year. Each
point on the line represents the number of visits in a specific month, showing the overall trend of
website traffic.
4. **Scatter Plots:**
- **Description:** Scatter plots display individual data points on a two-dimensional graph, with
one variable on the x-axis and another on the y-axis. They are useful for identifying relationships
between variables.
- **Illustration:** A scatter plot can show the relationship between advertising spending and sales
revenue. Each point represents a specific time period, and the position of the point indicates the
corresponding values for advertising spending and sales revenue.
5. **Tabular Reports:**
- **Description:** Tabular reports present data in a table format, providing a detailed view of
individual data points or summary statistics.
- **Illustration:** A tabular report can display monthly expenses for a business, breaking down
costs into categories such as utilities, rent, and salaries. Each row represents a specific month, and
columns show the expenses for each category.
Descriptive analytics tools and visualizations help businesses and analysts make sense of historical
data, identify patterns, and draw insights to inform decision-making processes.
predictive analytics,
Predictive analytics involves using data, statistical algorithms, and machine learning techniques to
identify the likelihood of future outcomes based on historical data. Here are some examples
illustrating predictive analytics:
1. **Credit Scoring:**
- **Scenario:** A bank wants to predict the likelihood of a customer defaulting on a loan.
- **Illustration:** Using predictive analytics, the bank can develop a credit scoring model. The
model considers various factors such as credit history, income, and debt to predict the probability of
a customer defaulting on a loan. This helps the bank make informed decisions about loan approvals
and interest rates.
2. **Customer Churn Prediction:**
- **Scenario:** A telecommunications company wants to identify customers at risk of churning.
- **Illustration:** By analyzing historical customer data, including usage patterns, customer
service interactions, and billing information, a predictive model can be built to forecast which
customers are likely to churn. The company can then take proactive measures, such as targeted
promotions or retention offers, to reduce churn.
3. **Inventory Management:**
- **Scenario:** An e-commerce retailer wants to optimize inventory levels.
- **Illustration:** Predictive analytics can be applied to analyze past sales data, seasonality, and
other relevant factors. By forecasting future demand for each product, the retailer can optimize
inventory levels, reduce carrying costs, and ensure products are available when customers want to
purchase them.
4. **Healthcare Readmission Prediction:**
- **Scenario:** A hospital aims to predict the likelihood of a patient being readmitted after a
specific medical procedure.
- **Illustration:** Using predictive analytics, the hospital can analyze patient data, including
medical history, vital signs, and previous admissions. A predictive model can then identify patients
at a higher risk of readmission, allowing healthcare providers to intervene with appropriate care and
resources to reduce readmission rates.
5. **Predictive Maintenance in Manufacturing:**
- **Scenario:** A manufacturing plant wants to minimize equipment downtime by predicting
when machinery is likely to fail.
- **Illustration:** Sensor data from machines can be analyzed using predictive analytics to
identify patterns indicative of potential equipment failure. By predicting maintenance needs in
advance, the plant can schedule maintenance activities proactively, reducing unplanned downtime
and optimizing operational efficiency.
6. **Fraud Detection:**
- **Scenario:** A financial institution aims to detect fraudulent transactions.
- **Illustration:** Predictive analytics can analyze transaction data, looking for patterns and
anomalies that may indicate fraudulent activity. Machine learning models can continuously learn
from new data to improve their accuracy in identifying potentially fraudulent transactions in real-
time.
Predictive analytics enables organizations to make data-driven decisions, anticipate future trends,
and proactively address challenges or opportunities. It is a powerful tool for enhancing business
operations across various industries.
predicative Modelling,
Predictive modeling means developing models that can be used to forecast or predict
future events. In business analytics, models can bedeveloped based on logic or data.
An ideal multiple variable modeling approach that can be used in this situation to
explore variable importance in this case study and eventuallylead to the development of a
predictive model for product sales is correlation and multiple regression. We will use both
Excel and IBM’s SPSS statistical packages to compute the statistics in this step of the BA
process.
First, we must consider the four independent variables—radio, TV, newspaper, POS—
before developing the model.
One way to see the statistical direction of the relationship (which is better than just
comparing graphic charts) is to compute the Pearson correlation coefficients r betweeneach of
the independent variables with the dependent variable (product sales). The SPSS correlation
coefficients and their levels of significance arepresented in Table 6.4. The comparable Excel
correlations are presented in Figure 6.5.
Although it can be argued that the positive or negative correlation coefficients should not
automatically discount any variable from what will be a predictive model, the negative
correlation of newspapers suggests that as a firm increases investment in newspaper ads, it will
decrease product sales. This does not make sense in this case study. Given the illogic of sucha
relationship, its potential use as an independent variable in a model is questionable. Also, this
negative correlation poses several questions that should be considered. Was the data set
correctly collected? Is the data set accurate? Was the sample large enough to have included
enough data for this variable to show a positive relationship? Should it be included for further
analysis? Although it is possible that a negative relationship can statistically show up like this,
it does not make sense in this case. Based on this reasoning and the fact that the correlation is
not statistically significant,this variable (i.e., newspaper ads) will be removed from further
consideration in this exploratory analysis to develop a predictive model.
Some researchers might also exclude POS based on the insignificance (p=0.479) of its
relationship with product sales. However, for purposes ofillustration, continue to consider it
a candidate for model inclusion. Also, the other two independent variables (radio and TV)
were both found to besignificantly related to product sales, as reflected in the correlation
coefficients in the tables.
The procedure by which multiple regression can be used to evaluate which independent
variables are best to include or exclude in a linear modelis called step-wise multiple
regression. It is based on an evaluation of regression models and their validation statistics—
specifically, the multiple correlation coefficients and the F-ratio from an ANOVA. SPSS
software and many other statistical systems build in the step-wise process. Some are called
backward step-wise regression and some are called forward step-wiseregression. The
backward step-wise regression starts with all the independent variables placed in the model,
and the step-wise process removes them one at a time based on worst predictors first until a
statistically significant model emerges. The forward step-wise regression starts with the best
related variable (using correction analysis as a guide), and then step-wise adds other variables
until adding more will no longer improve the accuracy of the model. The forward step-wise
regression process will be illustrated here manually. The first step is to generate individual
regression models and statistics for each independent variable with the dependent variable one
at a time. These three models are presentedin Tables 6.5, 6.6, and 6.7 for the POS, radio, and
TV variables, respectively. The comparable Excel regression statistics are presented in Tables
6.8, 6.9 and 6.10 for the POS, radio, and TV variables, respectively.
Table 6.5 SPSS POS Regression Model: Marketing/Planning Case Study
Table 6.6 SPSS Radio Regression Model: Marketing/Planning Case Study
The value of knowing this association of productscan improve the performance of the store
by reducing the need to spend money on advertising both products. The benefit is real if the
association holds true. Finding the association and proving it to be valid requires some analysis.
From the descriptive analytics analysis, some possible associations may have been
uncovered, such as product A’s and B’s association. With any size data file, the normal
procedure in data mining would be to divide the file into two parts. One is referred to as a training
data set, and the other as avalidation data set. The training data set develops the association
rules, andthe validation data set tests and proves that the rules work. Starting with thetraining
data set, a common data mining methodology is what-if analysis using logic-based software.
Excel and SPSS both have what-if logic-based software applications, and so do a number of
other software vendors .These software applications allow logic expressions. (For example, if
product A is present, then is product B present?) The systems can also provide frequency and
probability information to show the strengthof the association. These software systems have
differing capabilities, which permit users to deterministically simulate different scenarios to
identify complex combinations of associations between product purchases in a market basket.
Once a collection of possible associations is identified and their probabilities are computed,
the same logic associations (now considered association rules) are reran using the validation
data set. A new set of probabilities can be computed, and those can be statistically compared
using hypothesis testing methods to determine their similarity. Other software systems compute
correlations for testing purposes to judge the strength and the direction of the relationship. In
other words, if the consumer buys product A first, it could be referred to as the Head and product
B as the Body of the association. If thesame basic probabilities are statistically significant, it
lends validity to the association rules and their use for predicting market basket item purchases
based on groupings of products.
Data Mining Methodologies,
Data mining is an ideal predictive analytics tool used in the BA process. Table 6.2 lists a
small sampling of data mining methodologies to acquire different types of information. Some of
the same tools used in the descriptive analytics step are used in the predictive step butare employed
to establish a model (either based on logical connections or quantitative formulas) that may be
useful in predicting the future.
Several computer-based methodologies listed in Table 6.2 are briefly introduced here. Neural
networks are used to find associations where connections between words or numbers can be
determined. Specifically, neural networks can take large volumes of data and potential variables
and explore variable associations to express a beginning variable (referred to as an input layer),
through middle layers of interacting variables, and finally toan ending variable (referred to as an
output). More than just identifying simple one-on-one associations, neural networks link multiple
association pathways through big data like a collection of nodes in a network. These nodal
relationships constitute a form of classifying groupings of variables as related to one another, but
even more, related in complex paths with multiple association. SPSS has two versions of neural
network software functions: Multilayer Perceptron (MLP) and Radial Basis Function (RBF). Both
procedures produce a predictive model for one or more dependent variables based on the values of
the predictive variables. Both allow a decision maker to develop, train, and use the software to
identify particular traits (such as bad loan risks for abank) based on characteristics from data
collected on past customers).
The case study firm had collected a random sample of monthly sales information presented
in Figure 6.4 listed in thousands of dollars. What the firm wants to know is, given a fixed budget
of $350,000 for promoting this service product, when offered again, how best should the
company allocate budget dollars in hopes of maximizing the future estimated month’s product
sales? Before making any allocation of budget, there is a need to understandhow to estimate
future product sales. This requires understanding the behavior of product sales relative to sales
promotion efforts using radio, paper, TV, and point-of-sale (POS) ads.
Figure 6.4 Data for marketing/planning case study
The analysis also revealed little regarding the relationship of newspaper and POS ads to
product sales. So although radio and TV commercials are most promising, amore in-depth
predictive analytics analysis is called for to accurately measure and document the degree of
relationship that may exist in the variables to determine the best predictors of product sales.
Prescriptive Modelling,
After undertaking the descriptive and predictive analytics steps in the BAprocess, one
should be positioned to undertake the final step: prescriptive analytics analysis. The prior
analysis should provide a forecast or predictionof what future trends in the business may
hold. For example, there may be significant statistical measures of increased (or decreased)
sales, profitability trends accurately measured in dollars for new market opportunities, or
measured cost savings from a future joint venture.
Step 3 of the BA process, prescriptive analytics, involves the application ofdecision
science, management science, or operations research methodologies to make best use of
allocable resources. These are mathematically based methodologies and algorithms
designed to take variables and other parameters into a quantitative framework and generate
an optimal or near-optimal solution to complex problems. These methodologies can be used
to optimally allocate a firm’s limited resources to take best advantage of the opportunities
it has found in the predicted future trends. Limits on human, technology, and financial
resources prevent any firm from going after all the opportunities. Using prescriptive
analyticsallows the firm to allocate limited resources to optimally or near-optimally achieve
the objectives as fully as possible.
The listing of the prescriptive analytic methodologies as they are in some cases utilized
in
the BA process is again presented in Figure 7.1 to form the basis of thischapter’s content.
Prescriptive Modeling:
The listing of prescriptive analytic methods and models in Figure 7.1 is but a small
grouping of many operations research, decision science, and management science
methodologies that are applied in this step of the BA process. The explanation and use of
most of the methodologies in Table 7.1are explained throughout this book. (See Additional
Information column inTable 7.1.)
nonlinear Optimization.
When business performance cost or profit functions becometoo complex for simple
linear models to be useful, exploration of nonlinear functions is a standard practice in BA.
Although the predictive nature of exploring for a mathematical expression to denote a trend
or establish a forecast falls mainly in the predictive analytics step of BA, the use of the
nonlinear function to optimize a decision can fall in the prescriptive analytics step.
there are many mathematical programing nonlinear methodologies and solution
procedures designed to generate optimal business performance solutions. Most of them
require careful estimation of parameters that may or may not be accurate, particularly given
the precision required of a solution that can be so precariously dependent upon parameter
accuracy. This precision is further complicated in BA by the large data files that should be
factored into the model-buildingeffort.
To overcome these limitations and be more inclusive in the use of largedata, regression
software can be applied. Curve Fitting software can be used to generate predictive
analytic modelsthat can also be utilized to aid in making prescriptive analytic decisions.
For purposes of illustration, SPSS’s Curve Fitting software will be used in this chapter.
Suppose that a resource allocation decision is being faced whereby one must decide how
many computer servers a service facility should purchase to optimize the firm’s costs of
running the facility. The firm’s predictive analytics effort has shown a growth trend. A new
facilityis called for if costs can be minimized. The firm has a history of setting uplarge and
small service facilities and has collected the 20 data points in Figure 7.2.
Figure 7.2 Data and SPSS Curve Fitting function selection window
The first step in using the curve-fitting methodology is to generate the best-fitting
curve to the data. By selecting all the SPSS models in Figure 7.2, the software applies each
point of data using the regression process of minimizing distance from a line. The result is
a series of regression models and statistics, including ANOVA and other testing statistics.
It is known from the previous illustration of regression that the adjusted R-Square statistic
can reveal the best estimated relationship between the independent (number of servers) and
dependent (total cost) variables. These statistics arepresented in Table 7.2. The best adjusted
R-Square value (the largest) occurs with the quadratic model, followed by the cubic model.
The more detailed supporting statistics for both of these models are presented in Table
BELOW. The graph for all the SPSS curve-fitting models appears in Figure 7.4.
Table 7.2 Adjusted R-Square Values of All SPSS Models
Table 7.3 Quadratic and Cubic Model SPSS Statistics
From Table 7.3, the resulting two statistically significant curve-fittedmodels follow:
Yp = 35417.772 − 5589.432 X + 268.445 X2 [Quadratic model]
Yp = 36133.696 − 5954.738 X + 310.895 X2 − 1.347 X3 [Cubic model]
Yp = the forecasted or predicted total cost, and
X = can be the number of computer servers.
Unit 4:
Forecasting Techniques: Qualitative and Judgmental Forecasting, Statistical Forecasting Models,
Forecasting Models for Stationary Time Series, Forecasting Models for Time Series with a Linear
Trend, Forecasting Time Series with Seasonality, Regression Forecasting with Casual Variables,
Selecting Appropriate Forecasting Models. Monte Carlo Simulation and Risk Analysis: Monte Carle
Simulation Using Analytic Solver Platform, New-Product Development Model, Newsvendor Model,
Overbooking Model, Cash Budget Model.
Qualitative and Judgmental Forecasting,
Qualitative and judgmental techniques rely on experience and intuition; they are
necessarywhen historical data are not available or when the decision maker needs to forecast
far into the future. Another use of judgmental methods is to incorporate nonquantitative
information, such as the impact of government regulations or competitor behavior, in a
quantitative forecast. Judgmental techniques range fromsuch simple methods as a manager’s
opinion or a group based jury of executive opinion to more structured approaches such as
historical analogy and the Delphi method.
Many forecasts are based on analysis of historical time-series data and are predicated
on the assumption that the future is an extrapolation of the past. A trend is a gradual upward
or downward movement of a time series over time.
Time series may also exhibit short-term seasonal effects (over a year, month, week, or
even a day) as well as longer-term cyclical effects, or nonlinear trends. A seasonal effect is one
that repeats at fixed intervals of time, typically a year, month, week, or day. At a neighborhood
grocery store, for instance, short-term seasonal patterns may occur over a week, with the
heaviest volume of customers on weekends; seasonal patterns may also be evident during the
course of a day, with higher volumes in the mornings and late afternoons. Figure 9.2 shows
seasonal changes in natural gas usage for a homeowner overthe course of a year (Excel file
Gas & Electric). Cyclical effects describe ups and downs over a much longer time frame, such
as several years. Figure 9.3 shows a chart of the data in the Excel file Federal Funds Rates.
We see some evidence of long-term cycles in the time series driven by economic factors, such
as periods of inflation and recession.
For time series with a linear trend but no significant seasonal components, double
moving average and double exponential smoothing models are more appropriate than using
simple moving average or exponential smoothing models. Both methods are based on the linear
trend equation:
That is, the forecast for k periods into the future from period t is a function of a base
valueat, also known as the level, and a trend, or slope, bt. Double moving average and double
exponential smoothing differ in how the data are used to arrive at appropriate values forat and
bt. Because the calculations are more complex than for simple moving average and exponential
smoothing models, it is easier to use forecasting software than to try to imple- ment the models
directly on a spreadsheet. Therefore, we do not discuss the theory or for-mulas underlying the
methods. XLMiner does not support a procedure for double moving average; however, it does
provide one for double exponential smoothing.
In double exponential smoothing the estimates of at & bt are obtained from following
equations:
In essence, we are smoothing both parameters of the linear trend model. From the first
equation, the estimate of the level in period t is a weighted average of the observed value at
time t and the predicted value at time t, at-1 + bt-1, based on simple exponential smoothing.
For large values of a, more weight is placed on the observed value. Lower values of put more
weight on the smoothed predicted value. Similarly, from the second equation, the estimate of
the trend in period t is a weighted average of the differences in the estimated levels in periods
t and t - 1 and the estimate of the level in period t - 1.
Larger values of b place more weight on the differences in the levels, but lower values
of b put more emphasis on the previous estimate of the trend. Initial values are chosen for a1
as A1 and b1 as A2 - A1. Equations (9.7) must then be used to compute at and bt for the entire
time series to be able to generate forecasts into the future. As with simple exponential
smoothing, we are free to choose the values of a and b. However, it is easier to let XLMiner
optimize these values using historical data.
Quite often, time-series data exhibit seasonality, especially on an annual basis. When
time series exhibit seasonality, different techniques provide better forecasts than other
techniques.
One approach is to use linear regression. Multiple linear regression models with
categorical variables can be used for time series with seasonality. To do this, we use
dummy categorical variables for the seasonal components.
based on the work of two researchers, C.C. Holt, who developed the basic approach,
and P.R. Winters, who extended Holt’s work. Hence, these approaches are commonly
referred to as Holt-Winters models. Holt-Winters models are similar to exponential
smoothing models in that smoothing con- stants are used to smooth out variations in the
level and seasonal patterns over time. For time series with seasonality but no trend,
XLMiner supports a Holt-Winters method but does not have the ability to optimize the
parameters.
Many time series exhibit both trend and seasonality. Such might be the case for growing
sales of a seasonal product. These models combine elements of both the trend and sea- sonal
models. Two types of Holt-Winters smoothing models are often used.
Ft +1 = at + bt + St-s + 1 (9.8)
The additive model applies to time series with relatively stable seasonality, whereas
the multiplicative model applies to time series whose amplitude increases or decreases over
time. Therefore, a chart of the time series should be viewed first to identify the appropriate
type of model to use. Three parameters, a, b, and g, are used to smooth the level, trend, and
seasonal factors in the time series. XLMiner supports both models.
In many forecasting applications, other independent variables besides time, such as eco-
nomic indexes or demographic factors, may influence the time series. For example, a man-
ufacturer of hospital equipment might include such variables as hospital capital spending and
changes in the proportion of people over the age of 65 in building models to forecast future
sales. Explanatory/causal models, often called econometric models, seek to iden tify factors
that explain statistically the patterns observed in the variable being forecast, usually with
regression analysis. We will use a simple example of forecasting gasoline sales to illustrate
econometric modeling.
Figure 9.27 shows gasoline sales over 10 weeks during June through
August along with the average price per gal- lon and a chart of the
gasoline sales time series with a fitted trendline (Excel file
Gasoline Sales). During the sum- mer months, it is not unusual to
see an increase in salesas more people go on vacations. The chart
shows a linear trend, although R2 is not very high. The trendline is:
sales 4,790.1 + 812.99 week
Using this model, we would predict sales for week 11 assales 4,790.1 +
812.99(11) 13,733 gallons………
The gasoline sales data, we also see that the average price per gallon changes each
week, and this may influence consumer sales. Therefore, the sales trend might not simply be
a factor of steadily increasing demand, but it might also be influenced by the average price
per gallon. The average price per gallon can be considered as a causal variable. Multiple
linear regression provides a technique for building forecasting models that incor-porate not
only time, but other potential causal variables also.
INCORPORATING CASUAL VARIABLES IN REGRESSION FORECASTING
MODEL:
For the gasoline sales data, we can incorporate theprice/gallon by using two
independent variables. This results in the multiple regression model sales
B0 + B1 week + B2 price>gallon
The results are shown in Figure 9.28, and the regres- sion model is sales
72333.08 + 508.67 week - 16463.2 price>gallon
Notice that the R2 value is higher when both variables areincluded, explaining more
than 86% of the variation in the data. If the company estimates that the average price
for the next week will drop to $3.80, the model would forecast the sales for week 11
as sales 72333.08 + 508.67(11) - 16463.2(3.80) 15,368 gallons
To use Analytic Solver Platform, you must perform the following steps:
To define a cell you wish to predict and create a distribution of output values
from your model (which Analytic Solver Platform calls an uncertain function cell),
first select it, and then click on the Results button in the Simulation Model group in the
Analytic Solver Platform ribbon. Choose the Output option and then In Cell.
RUNNING A SIMULATION:
To run a simulation, first click on the Options button in the Options group in the Analytic Solver
Platform ribbon. This displays a dialog (see Figure 12.7) in which you can specify the number of
trials and other options to run the simulation (make sure the Simulation tab is selected). Trials per
Simulation allows you to choose the number of times that Analytic Solver Platform will generate
random values for the uncertain cells in the model and recalculate the entire spreadsheet. Because
Monte Carlo simulation is essentially sta- tistical sampling, the larger the number of trials you use, the
more precise will be the result.
Unless the model is extremely complex, a large number of trials will not undulytax today’s
computers, so we recommend that you use at least 5,000 trials (the educationalversion restricts
this to a maximum of 10,000 trials). You should use a larger number of trials as the number of
uncertain cells in your model increases so that the simulation can generate representative
samples from all distributions for assumptions. You may run morethan one simulation if you
wish to examine the variability in the results.
FIG. 12.7. Analytic Solver Platform Options Dialog
Analytic Solver Platform has alternative sampling methods; the two most
common are Monte Carlo and Latin Hypercube sampling. Monte Carlo sampling
selects random variates independently over the entire range of possible values of the
distribution. With Latin Hypercube sampling, the uncertain variable’s probability
distribution is divided into intervals of equal probability and generates a value
randomly within each interval. Latin Hypercube sampling results in a more even
distribution of output values because it samples the entire range of the distribution in a
more consistent manner, thus achiev- ing more accurate forecast
statistics (particularly the mean) for a fixed number of Monte Carlo trials. However,
Monte Carlo sampling is more representative of reality and should be used if you are
interested in evaluating the model performance under various what-if scenarios. Unless
you are an advanced user, we recommend leaving the other options at their default
values.
The last step is to run the simulation by clicking the Simulate button in the Solve
Action group. When the simulation finishes, you will see a message “Simulation
finished successfully” in the lower-left corner of the Excel window.
You may specify whether you want output charts to automatically appear after
a simulation is run by clicking the Options button in the Analytic Solver Platform
ribbon, and either checking or unchecking the box Show charts after simulation in the
Charts tab. You may also view the results of the simulation at any time by double-
clicking on an output cell that contains the PsiOutput() function or by choosing
Simulation from the Reports button in the Analysis group in the Analytic Solver
Platform ribbon. This displays a win- dow with various tabs showing different charts
to analyze results.
1. **Demand Distribution:**
- The demand for the product is assumed to follow a probability distribution. The actual demand is
uncertain and can vary.
2. **Order Quantity (Q):**
- The decision variable is the order quantity, representing the number of units that the retailer orders to
meet customer demand.
3. **Unit Cost and Selling Price:**
- The retailer incurs a cost (c) per unit of the product ordered. The selling price (p) per unit is usually
higher than the unit cost.
4. **Salvage Value (V):**
- If the retailer orders more units than demanded, the excess units may have a lower salvage value (V)
or disposal cost. Salvage value represents the revenue generated from selling excess units, returning
unsold units to the supplier, or other disposal methods.
5. **Shortage Cost (h):**
- If the retailer orders fewer units than demanded, there is a shortage cost (h) associated with the lost
sales, backordering, or other costs related to unmet demand.
While overbooking can be a profitable strategy when executed judiciously, it requires careful planning
and continuous monitoring to ensure that it aligns with customer expectations and business goals.
Cash Budget Model.
A Cash Budget Model is a financial planning tool that helps organizations forecast and manage their
cash inflows and outflows over a specific period, typically on a monthly or quarterly basis. The primary
goal of a cash budget is to ensure that a business has sufficient liquidity to meet its operational needs,
repay debts, and invest in growth opportunities. This model is crucial for effective cash flow
management and helps businesses avoid liquidity issues.
3. **Operating Expenses:**
- Identify and estimate all operating expenses, such as rent, utilities, salaries, and other costs that
require cash payments.
4. **Loan Payments and Interest:**
- Include any loan payments and interest expenses in the cash budget.
5. **Other Cash Inflows and Outflows:**
- Consider any additional cash inflows or outflows, such as investments, asset purchases, or other
financial activities.
6. **Opening and Closing Balances:**
- Determine the opening cash balance for the period, calculate the net cash flow, and compute the
closing cash balance.
7. **Monitoring and Adjusting:**
- Regularly monitor actual cash flows against the budget and make adjustments as needed. This may
involve revising revenue or expense estimates based on changing circumstances.
Many decisions involve a choice from among a small set of alternatives with uncertain
consequences. We may formulate such decision problems by defining three things:
1. the decision alternatives that can be chosen,
2. the uncertain events that may occur after a decision is made along with theirpossible
outcomes, and
3. the consequences associated with each decision and outcome, which are usu- ally
expressed as payoffs.
The outcomes associated with uncertain events (which are often called states of nature),
are defined so that one and only one of them will occur. They may be quantitative or qualitative.
For instance, in selecting the size of a new factory, the future demand for the product would be
an uncertain event. The demand outcomes might be expressed quantita-tively in sales units or
dollars. On the other hand, suppose that you are planning a spring- break vacation to Florida in
January; you might define an uncertain event as the weather; these outcomes might be
characterized qualitatively: sunny and warm, sunny and cold, rainy and warm, rainy and cold,
and so on. A payoff is a measure of the value of making a decision and having a particular
outcome occur. This might be a simple estimate made judgmentally or a value computed from
a complex spreadsheet model. Payoffs are often summarized in a payoff table, a matrix whose
rows correspond to decisions and whose columns correspond to events. The decision maker
first selects a decision alternative, afterwhich one of the outcomes of the uncertain event occurs,
resulting in the payoff.
Aggressive (Optimistic) Strategy An aggressive decision maker might seek the option
that holds the promise of minimizing the potential loss. For a minimization objective, this
strategy is also often called a minimin strategy; that is, we choose the decision that minimizes
the minimum payoff that can occur among all outcomes for each decision. Aggressive decision
makers are often called speculators, particularly in financialarenas, because they increase their
exposure to risk in hopes of increasing their return; while a few may be lucky, most will not do
very well.
Opportunity-Loss Strategy A third approach that underlies decision choices for many
individuals is to consider the opportunity loss associated with a decision. Opportunity loss
represents the “regret” that people often feel after making a nonoptimal decision (I shouldhave
bought that stock years ago!). In general, the opportunity loss associated with any decision and
event is the absolute difference between the best decision for that particular outcome and the
payoff for the decision that was chosen. Opportunity losses can beonly nonnegative values.
If you get a negative number, then you made a mistake. Once opportunity losses are computed,
the decision strategy is similar to a conservative strategy.The decision maker would select the
decision that minimizes the largest opportunity loss among all outcomes for each decision. For
these reasons, this is also called a minimax regret strategy.
When the objective is to maximize the payoff, we can still apply aggressive, conservative,
and opportunity loss strategies, but we must make some key changes in the analysis.
(1) For the aggressive strategy, the best payoff for each decision would be the largest
value among all outcomes, and we would choose the decision corresponding to the
largest of these, called a maximax strategy.
(2) For the conservative strategy, the worst payoff for each decision would be the
smallest value among all outcomes, and we would choose the decision
corresponding to the largest of these, called a maximin strategy.
(3) For the opportunity-loss strategy, we need to be careful in calculating the
opportunity losses. With a maximize objective, the decision with the largest value
for a particular event has an opportunity loss of zero. The opportunitylosses
associated with other decisions is the absolute difference between their payoff and
the largest value. The actual decision is the same as when payoffs are costs:
Choose the decision that minimizes the maximum opportunityloss.
Many decisions require some type of tradeoff among conflicting objectives, such as risk
versus reward. A simple decision rule can be used whenever one wishes to make an optimal
tradeoffbetween any two conflicting objectives, one of which is good, and one of which is bad,
that maximizes the ratio of the good objective to the bad. First, display the tradeoffs on a
chart with the “good” objective on the x-axis, and the “bad” objective on the y-axis, making
sure to scale the axes properly todisplay the origin (0,0). Then graph the tangent line to the
tradeoff curve that goes throughthe origin. The point at which the tangent line touches the
curve (which represents the smallest slope) represents the best returnto risk tradeoff.
TABLE 16.1. Summary of Decision Strategies Under Uncertainty
EVALUATING RISK:
An implicit assumption in using the average payoff or expected value strategy is that
the decision is repeated a large number of times.
Decision Trees,
Decision trees may be created in Excel using Analytic Solver Platform. Click the
Decision Tree button. To add a node, select Add Node from the Node drop down list, as shown
in Figure 16.2. Click on the radio button for the type of node you wish to create (decision or
event). This displays one of the dialogs shown in Figure 16.3. For a decision node, enter
the name of the node and names of the branches that emanate fromthe node (you may also
add additional ones). The Value field can be used to input cash flows, costs, or revenues that
result from choosing a particular branch. For an event node, enter the name of the node and
branches. The Chance field allows you to enter the probabilities of the events.
We may use Excel data tables to investigate the sensitivity of the optimal decision to
changes inprobabilities or payoff values.
When we deal with uncertain outcomes, it is logical to try to obtain better information
about their likelihood of occurrence before making a decision. The value of information
represents the improvement in the expected return that can be achieved if the decision maker
is able to acquire before making a decision additional information about the future event that
will take place. In the ideal case, we would like to have perfect information, which tells us
with certainty what outcome will occur. Although this will never occur, it is useful to know the
value of perfect information because it provides an upper bound on the value of any
information that we may acquire. The expected value of perfect information (EVPI) is the
expected value with perfect information (assumed at no cost) minus the expected value
without any information; again, it represents the most you should be willing to pay for perfect
information.
The expected opportunity loss represents the average additional amount the decision
maker would have achieved by making the right decision instead of a wrong one. To find the
expected opportunity loss, we create an opportunity-loss table, as discussed earlier in this
chapter, and then find the expected value for each decision. It will always be true thatthe
decision having the best expected value will also have the minimum expected opportunity loss.
The minimum expected opportunity loss is the EVPI.
If assessments of event probabilities are available, these can be used to compute the
expected utility and identify the best decision.It can be rather difficult to compute a
utility function, especially for situations involving a large number of payoffs.
Because most decision makers typically are risk averse, we mayuse an exponential
utility function to approximate the true utility function.
The exponentialutility function is
Figure 16.14 shows several examples of U(x) for different val-ues of R. Notice that all these functions
are concave and that as R increases, the functions become flatter, indicating more tendency toward
risk neutrality.
One approach to estimating a reasonable value of R is to find the maximum payoff
$R for which the decision maker is willing to take an equal chance on winning $R or los-
ing
$R>2. The smaller the value of R, the more risk averse is the individual. For
instance, would you take a bet on winning $10 versus losing $5? How about
winning $10,000 ver- sus losing
$5,000? Most people probably would not worry about taking the first gamble but
might definitely think twice about the second. Finding one’s maximum comfort
level establishes theutility function.
Fig. 16.14. Examples of ExponentialUtility Functions
Recent Trends in :
Embedded and collaborative business intelligence,
As of my last knowledge update in January 2022, trends in embedded and collaborative
business intelligence (BI) have been evolving to meet the growing demands for data-driven
decision-making and enhanced user experiences. Keep in mind that developments in the field
may have occurred since then. Here are some trends that were relevant at the time:
1. **Embedded Analytics and BI:**
- **Integration with Applications:** There is a growing trend of embedding analytics
directly into other business applications. This integration allows users to access analytics and
BI tools seamlessly within the applications they already use, promoting a more unified user
experience.
- **Customization for Specific Use Cases:** Businesses are customizing embedded
analytics to fit specific use cases and industry needs. This involves tailoring analytics
solutions to the unique requirements of different user groups and verticals.
2. **Collaborative BI:**
- **Social Collaboration Features:** Collaborative BI platforms incorporate social
collaboration features, enabling users to share insights, annotations, and comments within the
BI environment. This fosters teamwork, knowledge sharing, and more informed decision-
making.
- **Real-Time Collaboration:** There is a shift towards real-time collaboration, allowing
multiple users to work on and interact with BI content simultaneously. This trend supports
dynamic discussions and collaborative analysis.
- **Integration with Communication Tools:** Integration with communication tools,
such as messaging apps and collaboration platforms, enhances the flow of information and
insights among team members.
3. **Self-Service BI:**
- **Empowering Non-Technical Users:** The emphasis on self-service BI continues,
with a focus on empowering non-technical users to create their own reports, dashboards, and
visualizations. This trend reduces dependence on IT teams for routine analytics tasks.
- **User-Friendly Interfaces:** BI tools are becoming more user-friendly with intuitive
interfaces, drag-and-drop functionality, and natural language processing. This facilitates
easier adoption and usage by business users.
4. **Mobile BI:**
- **Mobile-First Approach:** With the increasing use of smartphones and tablets, BI
vendors are adopting a mobile-first approach. Mobile BI enables users to access and interact
with analytics on the go, providing flexibility and ensuring that decision-makers are not tied
to their desks.
- **Responsive Design:** BI tools are incorporating responsive design principles to ensure
a consistent and user-friendly experience across various devices, screen sizes, and
orientations.
5. **AI and Machine Learning Integration:**
- **Automated Insights:** AI and machine learning capabilities are being integrated into
BI tools to automate insights generation. This helps users discover patterns, trends, and
anomalies without explicitly querying the data.
- **Predictive Analytics:** Predictive analytics, powered by machine learning algorithms,
is becoming more prevalent in BI. Businesses are using these capabilities to anticipate future
trends and make proactive decisions.
6. **Data Governance and Security:**
- **Focus on Data Governance:** As data privacy regulations become more stringent,
there is an increased focus on data governance within BI platforms. This includes features for
data lineage, data quality monitoring, and access controls to ensure compliance.
- **Embedded Security Measures:** Security features are being embedded directly into
BI solutions to protect sensitive information. This includes encryption, authentication, and
authorization mechanisms.
7. **Cloud-Based BI:**
- **Rise of Cloud-Based Solutions:** Cloud-based BI solutions are gaining popularity
due to their scalability, flexibility, and ease of implementation. Organizations are adopting
cloud-based BI to leverage the advantages of cloud infrastructure.
- **Hybrid Deployments:** Some businesses are opting for hybrid BI deployments,
combining on-premises and cloud solutions to meet specific performance, security, or
compliance requirements.
8. **Natural Language Processing (NLP):**
- **Querying with Natural Language:** NLP capabilities enable users to interact with BI
tools using natural language queries. This simplifies the process of data exploration and
analysis, making BI more accessible to a broader audience.
9. **Continuous Analytics:**
- **Real-Time Analytics:** The demand for real-time analytics is increasing, particularly
in industries where immediate insights are crucial. Continuous analytics enables
organizations to monitor and analyze data streams in real-time, leading to more timely
decision-making.
10. **Integration with Big Data Technologies:**
- **Handling Large Datasets:** As organizations deal with larger and more complex
datasets, BI tools are integrating with big data technologies to efficiently process and analyze
vast amounts of data.
- **Support for Data Variety:** BI platforms are evolving to handle diverse data types,
including structured, semi-structured, and unstructured data, enabling a more comprehensive
view of business information.
These trends collectively reflect a shift towards more user-centric, collaborative, and
intelligent BI solutions. Keep in mind that the field of embedded and collaborative BI is
dynamic, and ongoing technological advancements will likely influence the trajectory of
these trends.
Visual data recovery,
As of my last knowledge update in January 2022, trends in visual data recovery have been
shaped by advancements in computer vision, machine learning, and image processing
technologies. It's important to note that developments in this field may have occurred since
then. Here are some trends that were relevant at the time:
It's important to stay updated with the latest research and industry advancements in visual
data recovery, as this field is dynamic, and new trends may emerge over time.
Data Storytelling and Data journalism
As of my last knowledge update in January 2022, data storytelling and data journalism have
continued to evolve with advancements in technology and changing data consumption
patterns. Here are some trends that were relevant at the time:
1. **Interactive Data Visualizations:**
- **User Engagement:** Data storytelling increasingly involves interactive data
visualizations. These visuals allow users to explore data on their own, enhancing engagement
and understanding. Tools like D3.js, Tableau, and Power BI facilitate the creation of
interactive dashboards.