Prof. Jashaswi - Mandal - Descriptive Analytics Data Visualization - 12.06.24
Prof. Jashaswi - Mandal - Descriptive Analytics Data Visualization - 12.06.24
CHAPTER 4
By Amar Sahay
(A Business Expert Press Book)
Harvard Business Publishing distributes in digital form the individual chapters from a wide selection of books on business from
publishers including Harvard Business Press and numerous other companies. To order copies or request permission to
reproduce materials, call 1-800-545-7685 or go to http://www.hbsp.harvard.edu. No part of this publication may be reproduced,
stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical,
photocopying, recording, or otherwise – without the permission of Harvard Business Publishing, which is an affiliate of Harvard
Business School.
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from Jun 2024 to Dec 2024.
CHAPTEr 4
Descriptive Analytics:
Data Visualization
Chapter Highlights
Introduction
Basic Concepts in Data Visualization
Presenting Data: Collection and Presentation of Data
Organizing Data: An Example
Summarizing Quantitative Data: Frequency Distribution
Histogram: A Graph of Frequency Distribution
Example: Histogram: Summarizing Data and Examining the
Distribution
Graphical Summary of Data
Graphical Display of Variation
Data visualization: Conventional and Simple Techniques
Stem-and-Leaf Plot
Box Plots
More Applications of Box Plots
Dot Plots
Bar Charts, a Cluster Bar Chart, and Stacked Bar Chart
Describing, Summarizing, and Graphing Categorical Variables
Creating Bar Chart from a Simple Tally
Example: Cross Tabulation with Two and Tree Categorical Variables
Pie Charts
Interval Plots
Example: Interval Plot Showing the Variation in Sample Data
Time Series Plots
Sequence Plot: Plot of Process Data
Example: Sequence Plot
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
80 BUSINESS ANALYtICS
Introduction
Data visualization is presenting the data visually or graphically. Te graph-
ical displays are extremely helpful in detecting the patterns, trends, and
correlations that are not usually apparent from the raw data. Te trends
and the patterns in the data cannot be recognized and they go undetected
if not in the visual form.
Data visualization is an integral part of business intelligence (BI).
Most of the BI application software heavily emphasize on data visualiza-
tion and have strong data visualization capabilities. One of the reasons
for the popularity of visualization tools is that they are easier to use and
comprehend and do not require extensive training as in the case of sta-
tistical software. A number of popular statistical software are available
that heavily emphasize on analysis and modeling along with graphing
capabilities. Tey are typically easier to operate than traditional statistical
analysis software or earlier versions of BI software. Tis has led to a rise
in lines of business implementing data visualization tools on their own,
without support from IT.
Te data visualization tools and software now have advanced capabil-
ities. Tey go beyond the standard charts and graphs used in Microsoft
Excel and other standard statistical software. Current data visualization
software can display data in form of graphs and charts contained in dash-
boards that display multiple views of data. Tese dashboards are extremely
helpful decision making tools. A number of specialized graphs including
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 81
infographics, heat maps, geographic maps, detailed bar, and pie charts
can be created using visualization software. In many cases, the visuals
created may have interactive capabilities that allow for manipulating data,
querying, and analysis.
Data visualization software plays an important role in big data and
advanced analytics projects. Massive amounts of data are now collected
by businesses. Te visualization and analysis of this data is referred to
as big data analysis. Visualization of big data requires specially designed
software to quickly and easily get an overview through data dashboards.
Te success of the two leading software vendors—Tableau and Qlik—
has moved other vendors toward a more visual approach in their software.
Virtually all big data software in the BI space has strong data visualization
functionality. It does not mean that only the software designed for big
data, such as Tableau and Qlik (the two leading vendors in the BI space)
can only be used for data visualization. A number of standard statisti-
cal software including MINITAB, SAS, STATS PRO, SPSS, and others
along with widely used spreadsheet program Excel are widely used for
data visualization. Te basics and fundamentals of visuals and graphics
created using the standard statistical software or big data software are the
same. Te diference lies in their capabilities. Big data visualization soft-
ware has capabilities of handling massive amounts of data. Tey are capa-
ble of creating dashboards that can provide multiple views of data on one
plot. In this chapter, we provide the fundamentals of data visualization
along with a number of examples of visuals that can be created from the
data. We also provide the applications and interpretation of these visuals.
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
82 BUSINESS ANALYtICS
• Tables, and
• Graphs
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 83
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
84 BUSINESS ANALYtICS
Class-Interval Frequency
45 ≤ X < 48
Tis means that the values in this class interval include the value 45 but
not 48. Te value 45 is known as the lower class boundary or lower class limit
and the value 48 is known as the upper class boundary or upper class limit�
Tere are several other possibilities of grouping or constructing fre-
quency distributions using the information in Table 4.2. Te following
information is helpful while grouping or forming a frequency distribution:
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 85
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
86 BUSINESS ANALYtICS
possible for the same set of data. Te grouping can be performed easily
using many statistical software.
Te selling price of 300 homes for the past six months in a certain city
is summarized in Figure 4.2 in form of a histogram. Te bars show the
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 87
21
20
Frequency
15
15 14
10 9
7
5
3 3
1 1
0
48 54 60 66 72
Driving Sp (mph)
60 57
52
50
Frequency
42
40
30 27
23
20
14
9
10
5
3 2
1
0
240 280 320 360 400 440
Home price ($000)
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
88 BUSINESS ANALYtICS
15 14.0%
Percent
10 9.0%
7.7%
4.7%
5
3.0%
1.6%
1.0% 0.7%
0.3%
0
240 280 320 360 400 440
Home price ($000)
40
30 27
23
20 14
9
10 5
3 2
1
0
240 280 320 360 400 440
Home price ($000)
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 89
Median
Two data sets with same mean but different standard deviations
Mean
Figure 4.6 Data sets A and B with same mean but different
variations
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
90 BUSINESS ANALYtICS
Two data sets with different means but same standard deviation
B
A
Mean Mean
Figure 4.7 Data sets A and B with same variation but different
means
Stem-and-Leaf Plot
Stem-and-Leaf plot is a very efcient way of displaying data and checking
the variation and shape of the distribution. Tis plot is obtained by divid-
ing each data value into two parts; stem and leaf. For example, if the data
are two-digit numbers, e.g., 34, 56, 67, and so on, then the frst number
(the tens digit) is considered the stem value, and the second number (the
ones digit) is considered the leaf value. Tus, in data value 56, 5 is the
stem and 6 is the leaf. In a three-digit data value, the frst two digits are
considered the stem and the third digit as the leaf.
Example 4.2
Te stem-and leaf plot in Figure 4.8 shows the number of orders received
per day by a company. It is convenient to construct the plot using sorted
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCRIPTIVE ANALYTICS: DATA VISUALIZATION 91
data. Tere are three columns in the plot. Te frst column (labeled: 1)
shows the cumulative count of the number of observations, the second
(middle) column (labeled: 2) shows the stem values and the numbers fol-
lowing the second column (labeled: 3) represent the leaves. Te frst row
has the following values:
1 9 2
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
92 BUSINESS ANALYtICS
Tis means that there is one observation in this row, the stem value is
9, and the leaf value is 2. Tus frst value is 92. Te second row also has
one value in this row with a stem-value of 10 and the leaf value of 3.
2 10 3
Te frst column in the second row shows the cumulative count of
observations up to this point. Tis value is 2. Tis means that there are
two observations up to this row (1 in the frst row and 1 value in the
second row); the stem is 10 and the leaf value is 3, making the value in
the second row 103.
Refer to Figure 4.8, column 1 again. Te values from the top are 1, 2,
5, 7, 8, 11, 15, 22, and 27. Tis means that there are 27 observations up
to row 9. Te next number is 11, which is enclosed in a parenthesis: (11).
Tis indicates that there are 11 observations in this row and this row
contains the median value of the data. Once the median is determined,
the count begins starting from the bottom row. Look into the bottom
row that shows
2 23 18
Tis indicates there are two observations in this row, which are 231
and 238.
We can see from the earlier fgure that the shape of the data is left
skewed or negatively skewed, the minimum value is 92—the frst value in
the frst row and the maximum value is 238, the last value. To fnd the total
number of observations, add the observations in the median row, which
is (11) and the observations above and below the median row; that is,
27+11+17=55. Te stem-and-leaf can be used to obtain the information
shown in the second column in Figure 4.8.
Box-Plots
Te box-plot displays the smallest and the largest values in the data along
with the three quartiles: Q1, Q2, and Q3. Te display of these fve num-
bers (known as fve measure summary) may be used to study the shape
of the distribution and draw conclusion from the data. Diferent types
of box plots can be created from the data. Some of these plots are shown
as follows.
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 93
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
94 BUSINESS ANALYtICS
11.8
Min=6.8 Max.=16.6
6 8 10 12 14 16 18
Waiting time
75.04
75.03 75.0275
Diameter
75.00
74.99
74.98
1 2 3 4 5 6 7 8
Day
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 95
75.04
75.03
Diameter
75.02
75.01
75.00
74.99
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5
75.02
75.01
75.00
74.99
e1 e2 e3 e4 e5 e1 e2 e3 e4 e5 e1 e2 e3 e4 e5 e1 e2 e3 e4 e5
pl pl pl pl pl pl pl pl pl pl pl pl pl pl pl pl pl pl pl pl
m m m m m m m m m m m m m m m m m m m m
Sa Sa Sa Sa Sa Sa Sa Sa Sa Sa Sa Sa Sa Sa Sa Sa Sa Sa Sa Sa
Machine 1 2 3 4
of these shafts. Te plot can be used to check the consistency and distri-
bution of the diameters with respect to the machines.
Figure 4.12 shows the variation of the box plots where samples from
each of the four machines in production are plotted separately. Tese plots
can be used to check the consistency and distribution of the diameter
with respect to each machine. Suppose you want to check the consistency
of the diameters of fve samples with respect to three machine operators.
Te plots in Figure 4.13 can be used for this purpose.
Dot Plots
A dot plot may be used to study the shape of the distribution or to com-
pare two or more than two sets of data. In a dot plot, the horizontal axis
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
96 BUSINESS ANALYtICS
Diameter
75.03
75.02
75.01
75.00
74.99
e 1 e 2 e 3 e 4 e 5 le 1 le 2 le 3 le 4 le 5 le 1 le 2 le 3 le 4 le 5
pl pl pl pl pl p p p p p p p p p p
m m m m m m m m m m am am am am am
Sa Sa Sa Sa Sa Sa Sa Sa Sa Sa S S S S S
Operator A B C
Example 4.3
Figure 4.14 shows the dot plot of the data that represents the spot speed
100 cars at 65 mph speed limit zone. Te dot plot in Figure 4.15 shows
the number of cars sold by a dealership over a period of 100 days. Te
numbers of cars are the total number sold at four diferent locations of
the same dealership. Te horizontal axis shows the number of cars sold
48 52 56 60 64 68 72
Driving Sp (mph)
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 97
2 4 6 8 10 12 14
Number of cars sold
and the vertical axis shows the days. Te frst value on the horizontal axis
is 2 with three dots above it. Tis means that three cars were sold in the
frst two days. Te total number of dots is 100 indicating the number sold
over 100 days.
Bar Charts
Bar charts are one of the widely used charts to display categorical data.
Tese charts can be used to display monthly or quarterly sales, revenue,
and profts for a company. Figure 4.16 shows the monthly sales of a
company. Figure 4.17 shows a variation of the bar chart.
100
50
0
y ry ril st r
ar ua ar
ch ay ne ly be
nu br Ap M Ju Ju gu
te
m
Ja Fe
M Au p
Se
Month
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
98 BUSINESS ANALYtICS
250
250
Sales ($000)
225
225 220
210
200
185
180
175 170
150
150
January March May July September
Month
Figure 4.17 Connected line over the bar chart of sales vs. month
(a) A Vertical Bar Chart. Figure 4.18 shows a vertical bar chart showing
the gold price from 1975 to 2011.
Te previous chart is useful in visualizing the trend and also the
percent increase and decrease in the value over the years. For example:
Percent increase in the price of gold (per ounce) between 1980 and 2011
can be determined as:
Te price in 1980=$594.90 per ounce and the price in 2011=$1680.0.
Terefore, the percent increase=(1680−594.90)/594.90*100=182.4%.
(b) A Cluster Bar Chart: A cluster bar chart can be used to compare
data categories. An example of cluster bar chart showing zone wise
$1,600.00
Gold price ($/ounce)
$1,400.00
$1,200.00
$1,000.00
$ 800.00
$ 600.00
$ 400.00
$ 200.00
$ 0.00
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11
19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 20 20
Year
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 99
30.0
Zone 1
Zone 2
20.0 Zone 3
10.0
0.0
Quarter 1 Quarter 2 Quarter 3 Quarter 4
105
100
50
0
Quarter 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
year 1 2 3 4
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
100 BUSINESS ANALYtICS
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 101
3 4 5 4 3 2 1 4 3 0 0 0 3 3 1 1 1 1 4 4 4 5 4
5 5 3 2 3 3 4 4 4 4 4 5 3 2 4 5 3 1 4 5 5 0 0
2 3 5 4 0 0 0 3 4 3 2 4 4 4 4 4 5 3 3 0 4 4 3
3 5 4 4 5 3 3 2 2 5 4 3 2 1 1 2 3 4 5 4 3 2 1
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
102 BUSINESS ANALYtICS
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 103
50
45
40
Count
32
30
23
20 21
20
10
0
0 1 2 3 4 5
Product 1 rating
(a)
25
22.5
20
Percent
16
15
11.5
10 10.5
10
0
0 1 2 3 4 5
Product 1 rating
Percent within all data
(b)
Figure 4.22 (a) Bar chart of Product 1 rating. (b) Bar chart of
Product 1 rating (bars showing percent)
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
104 BUSINESS ANALYtICS
49
50
41
40
Percent
30
30
25
20 18
10
0
Excellent Fair Good Poor Satisfactory Very good
Product 2 rating
(a)
20 18.64
1
Percent
15 13.64
1
11.36
1
P
10 8.18
8
0
Excellent
E Fair
F Good
G Poor
P Satisfactory
S Very
V good
Product
P 2 rating
Percent
P within all data
(b)
Figure 4.23 (a) Bar chart of Product 2 rating. (b) Bar chart of
Product 2 rating (bars showing percent)
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 105
25
Count
20 19
16
15
12
11 11
10
6
5
0
Major 1 2 3 4 5 1 2 3 4 5
Gender Female Male
Major: 1: Computer science 2: Engineering 3: Social science 4: Business 5: Other
40
35
31
30 29
Count
20 17
14
10 10
10
4 4
0
Major 1 2 3 4 5 1 2 3 4 5
Employment status Employed Self-employed
Pie Charts
A pie chart is used to show the relative magnitudes of parts to a whole. In
this chart relative frequencies of each group of data are plotted. A circle is
constructed and is divided into distinct sections. Each section represents
one group of data. Te area of each section is determined by multiplying
the relative frequency of each section by the angle of a circle. Since there are
360° in a circle, the relative frequency of each section is multiplied by 360°
to obtain the correct number of degrees for each section. Some examples of
pie charts and their variations are shown in the following pages.
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
106 BUSINESS ANALYtICS
Figure 4.26 shows a simple pie chart of U.S. Federal budget expen-
ditures. Te chart clearly shows the major categories along with the
dollar values and the percentages. Several variations of this chart can
be created.
Figure 4.27 shows a variation of the pie chart. Tis chart is commonly
known as Bar of Pie. A bar chart is created that is an extension of the pie
chart. Te purpose of the bar chart is to show the important features of
one of the main categories. Te pie chart shows the energy consump-
tion for 2014 by diferent energy sources. Te renewable energy usage is
10 percent of the total and this category comprises of diferent categories,
the percentages of which are shown using a bar chart.
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 107
Petroleum
36%
Wind
Renewable Hydroelectric 18%
energy 27% Biomass
Natural gas 10% waste
28% Wood 4%
Biofuels
Coal Nuclear 24% 22%
18% electric
power
8%
Figure 4.28 Displays a Pie of Pie chart. In this chart, the bar is replaced
with a pie chart to show the proportions of a category of interest.
Interval Plots
Te interval plot displays means and/or confdence intervals for one or
more variables. Tis plot is useful for assessing the measure of central
tendency and variability of data. Te default confdence interval is
95 percent; however, this can be changed. We will demonstrate the inter-
val plot using the production data of beverage cans. Tis data contains the
amount of beverage in 16 oz. cans from fve diferent production lines.
Te operations manager suspects that the mean content of the cans dif-
fers from line to line. He randomly selected fve cans from each line and
measured the contents. Te interval plot from fve diferent production
lines is shown in Figure 4.29.
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
108 BUSINESS ANALYtICS
16.25
16.2045
16.1949
16.20
Content (oz.)
16.15 16.1391
16.1201
16.1195
16.10
16.0809 16.0879
16.05 16.0559
16.00
15.95 15.9611
1 2 3 4 5
Production line
Individual standard deviations were used to calculate the intervals
45.03
Piston ring dia
45.02
45.01
45.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sample
Individual standard deviations were used to calculate the intervals
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 109
350
Demand
300
250
200
150
1 6 12 18 24 30 36 42 48 54 60
Index
90
80
70
Sales
60
50
40
30
Week,T 1 6 12 18 24 30 36 42 48 54 60
Quarter 1 1 1 2 2 3 3 4 4 5 5
80
70
60
50
40
30
1 6 12 18 24 30 36 42 48 54 60
Week,T
Figure 4.33 A multiple time series plot showing sales and forecast
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
110 BUSINESS ANALYtICS
forecast follows the trend in the sales. Figure 4.34 shows a seasonal pat-
tern for the furnace flter demand and Figure 4.35 shows an increasing
trend in sales over time.
In all of the aforementioned time series plots, the trends and patterns
cannot be seen unless the data are plotted.
5000
4000
3000
2000
1000
1 5 10 15 20 25 30 35 40 45
Index
90
80
70
Sale ($000)
60
50
40
30
20
10
1 11 22 33 44 55 66 77 88 99 110
Week
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 111
Example 4.11
Te data in Table 4.11 list the deviation (in 0.00025-inch units) of the
diameter of 90 machined shafts from the target value. In these data,
0 means that the measured diameter was right on target, 2 means that
the measured diameter was 0.0005 inch above the target value; whereas, a
3 means that the measured diameter was 0.00075 above the target value.
We constructed a sequence plot of the data and interpreted the results.
Figures 4.36 and 4.37 show two variations of the sequence plot.
Figure 4.36 shows large deviation for part numbers 27, 30, 44, 45, and
72. Te rest of the measurements do not show large deviation. To see if
all the measurements are within the specifed limits, we can also plot the
specifcation limits on the plot (see Figure 4.37).
Suppose that the specifcation limits on the shaft diameter are
2±0.0025 inch. Tis means that in Figure 4.37 the target value coded 0
is 2, the upper limit is 10 (which is 0.00025*10=0.0025), and the lower
limit is −10. Figure 4.37 shows the sequence plot with specifcation
limits. From this plot, you can see that part numbers 27, 44, and 72 are
outside of the specifcation limits. At this stage, identifying the problems
and taking corrective actions will bring the products under control.
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
112 BUSINESS ANALYtICS
10
−5
−10
1 9 18 27 36 45 54 63 72 81 90
Part number
15
Deviation from target
10 10
0 0
−5
−10 −10
1 9 18 27 36 45 54 63 72 81 90
Part number
16
15 15
13
12
11
10
9
1 12 24 36 48 60 72 84 96 108 120
Number
From Figure 4.38, we see that the delivery times vary considerably.
In some places, the process shows little variation. In others, it varies sig-
nifcantly. A line is drawn at 15 minutes to show the target value. Te
values above this line indicate delivery exceeding 15 minutes. Tere
are 13 or 10.8 percent (13/120=0.108*100) or approximately 11 per-
cent deliveries exceeding 15 minutes. Tis amounts to 108,000 missed
deliveries in a million deliveries. Te pizza chain needs to study the
causes of variation to stabilize this process and meet the target delivery.
$1,600.00
$1,400.00
Gold price ($/ounce)
$1,200.00
$1,000.00
$800.00
$600.00
$400.00
$200.00
$0.00
1970 1980 1990 2000 2010
Year
Area Graph
Te area graph is used to examine trends in multiple time series as well as
each series’ contribution to the sum. Te area graph in Figure 4.40 shows
the monthly production of crude oil (in thousands of barrels) from 1920
to 2014.
3000000
2000000
1000000
0
1920 1935 1950 1965 1980 1995 2010
Year
84
88
92
96
00
19
19
19
19
20
Year USSR
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 115
Example 4.14
Figure 4.42 shows a scatter plot depicting the relationship between sales
and advertising expenditure for a company.
From Figure 4.42 we can see a distinct increase in sales associated
with the higher values of advertisement dollars. Tis is an indication of a
positive relationship between the two variables. Tis means that an increase
in one variable leads to an increase in the other one.
Example 4.15
Figure 4.43 shows the relationship between the home heating cost and
the average outside temperature. Tis plot shows a tendency for the
points to follow a straight line with a negative slope. Tis means that
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
116 BUSINESS ANALYtICS
90
80
Sales ($000)
70
60
50
40
30
5.0 7.5 10.0 12.5 15.0 17.5
Advertisement ($000)
300
Heating cost
200
100
0
0 10 20 30 40 50 60 70
Avg. temp.
there is an inverse or negative relationship between the heating cost and the
average temperature. As the average outside temperature increases, the
home heating cost goes down. Figure 4.44 shows a weak or no relation-
ship between quality rating and material cost of a product.
Example 4.16
In Figure 4.45, we have plotted the summer temperature and the amount
of electricity used (in millions of kilowatts). Te plotted points in this
fgure can be well approximated by a straight line. Terefore, we can con-
clude that a linear relationship exists between the two variables.
Te linear relationship can be explained by ftting a regression line
over the scatter plot as shown in Figure 4.46. Te equation of this line is
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 117
9.0
Quality rating
8.5
8.0
7.5
7.0
200 250 300 350 400 450 500 550
Material cost
Figure 4.44 Scatter plot of quality rating and material cost (weak/no
relationship)
30
Electricity used
28
26
24
22
20
75 80 85 90 95 100 105
Summer temperature
30
Electricity used
28
26
24
22
20
75 80 85 90 95 100 105
Summer temperature
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
118 BUSINESS ANALYtICS
In many cases, the relationship between the two variables under study
may be non-linear. Figure 4.47 shows the plot of the yield of a chemical
process at diferent temperatures.
Te scatter plot of the variables temperature (x) and the yield (y)
shows a non-linear relationship that can be best approximated by a qua-
dratic equation. Te equation of the ftted curve in Figure 4.47 obtained
using a computer package is y = –1022 + 320.3x – 1.054x2. Tis equation
can be used to predict the yield (y) for a particular temperature (x).
25000 S 897.204
R-Sq 97.8%
20000
15000
Yield
10000
5000
0
50 100 150 200 250 300
Temp.
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 119
Figures 4.48(a) through (d) show several scatter plots with the correlation
coefcient.
Figure 4.48(a) shows a positive correlation between the sales and proft
with a correlation coefcient value r = +0.979. Figure 4.48(b) shows a
positive relationship between the sales and advertisement expenditures
with a calculated correlation coefcient r = +0.902. Figure 4.48(c) shows
a negative relationship between the heating cost and the average tempera-
ture. Terefore, the coefcient of correlation (r) for this plot is negative r
= –0.827. Te correlation for the scatter plot in Figure 4.48(d) indicates
a weak relationship between the quality rating and the material cost. Tis
Scatterplot of profit ($) vs. sales ($) Scatterplot of sales ($) vs. advertisement ($)
100
120
100 80
Profit ($)
Sales ($)
80
60
60
9
Quality Rating
300
Heating cost
200
8
100
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
120 BUSINESS ANALYtICS
can also be seen from the coefcient of correlation which shows a value of
r = 0.076. Tese graphs are very helpful in describing bivariate relation-
ships or the relationship between the two quantitative variables and can
be easily created using computer packages such as, MINITAB or EXCEL.
Note that the plots in Figure 4.48(a) and (b) shows strong positive
correlation; (c) shows a negative correlation while (d) shows a weak
correlation.
90
80
70
60
50
40
30 40 50 60 70 80 90 100
Sales ($)
300
Heating cost
200
100
0
0 10 20 30 40 50 60 70
Avg. temp.
Figure 4.50 Scatter plot with ftted line—heating cost vs. average temperature
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
DESCrIPtIvE ANALYtICS: DAtA vISUALIZAtION 121
Example 4.18
The bubble plots in Figures 4.51 and 4.52 investigate the relationship
between three variables—the advertisement expenditure, sales (both
in thousands of dollars) and store size for a large retailer. The retailer
has different sizes of store that can be classified as small, medium,
and large.
In Figures 4.51, the small, medium, and large store sizes are labeled
1, 2, and 3 respectively; whereas in Figure 4.52, the store sizes are labeled
not numbered. Te bubble graphs show that an increase in advertisement
expenditure leads to increased sales but the large stores not necessarily
have the largest sales.
2
3
50 3
2 3 3 1
2 1 3
40 2 3
2
1 2
30 3
5 6 7 8 9 10 11 12 13 14
Advertisement
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
122 BUSINESS ANALYtICS
Sales
Medium
Large
Large
50 Medium Large Large Small
Medium Small Large
40 Medium Large
Medium
Small Medium
Large
30
5 6 7 8 9 10 11 12 13 14
Advertisement
Te bubble plots in Figures 4.53 through 4.54 show some more variations.
1500 15
16
1400
15
1300 11
16
1200 13
14
10
11 11
14
1100 12 13
10 12 10
1000 8
900 7
Advertisement (x1)
C
C
1400
AB
1300
B
1200 B
B
C
B
B
1100 B
C C
A B C
1000 A
900 A
Advertisement (x1)
100
80
Sales ($000)
60
40
20
0
0 10 20 30
Advertisement ($000)
Description/
Types of graphs/charts application
A matrix plot is used
Matrix plot of heating cost vs. avg. temp., house size, age of furnace
1 3 5
to investigate the
450
400
relationships between
pairs of variables by
Heating cost
350
300
250 creating an array of
200
scatter plots. In regression
150
100 analysis and modeling,
50
0 25 50 4 8 12 often the relationship
Avg. temp House size Age of furnace between multiple
variables is of interest.
In such cases, matrix plots
can be created to visually
investigate the relation-
ship between the response
variable and each of the
independent variables
or predictors
A variation of the matrix
Matrix plot production cost (y) with each independent variable
plot—Plot of response
300 450 600 400 800 1200
3000 variable (cost) on the
2750 y-axis with each of the
2500
independent variables
Cost
2250
2000
1750
1500
900 1200 1500 80 160 240 600 800 1000
No. of products Machine hours Overtime cost Lab or hours Material cost
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.
124 BUSINESS ANALYtICS
Matrix plot production cost (y) with each independent variable A variation of the matrix
300
(With fitted regression line)
450 600 400 800 1200 plot—this matrix plot
3250
3000
shows the ftted regression
2750 lines on each plot. the
2500 response variable is the
Cost
2250
2000
cost and the variables on
1750 the x-axis are the inde-
1500 pendent variables
900 1200 1500 80 160 240 600 800 1000
No. of products Machine hours Overtime cost Lab or hours Material cost
This document is authorized for use only in Prof Jashaswi Mandal 's A24_Payable - Statistics - 12.06.24 at NITIE - National Institute of Industrial Engineering from
Jun 2024 to Dec 2024.