Ch06-Descriptive Statistics
Ch06-Descriptive Statistics
for Engineers
Seventh Edition
Douglas C. Montgomery George C. Runger
Chapter 6
Descriptive Statistics
Chapter 6 Title Slide
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 1
6 The Role of Statistics in Engineering
CHAPTER OUTLINE
Chapter 6 Contents
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
2
Learning Objectives for Chapter 6
After careful study of this chapter, you should be able to do the following:
1. Compute and interpret the sample mean, variance, standard deviation, median, and range
2. Explain the concepts of sample mean, variance, population mean, and population variance
3. Construct and interpret visual data displays, including stem-and-leaf display, histogram, and
the box plot
4. Explain the concept of random sampling
5. Construct and interpret normal probability plots
6. Explain how to use box plots and other data displays to visually compare two or more samples
of data
7. Know how to use simple time series plots to visually display the important features of time-
oriented data
8. Know how to construct and interpret scatter diagrams of two or more variables
For a finite population with 𝑁 equally likely values, the probability mass function is
𝑓(𝑥𝑖 ) = 1/𝑁 and the mean is
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
6
Sample Variance and Standard Deviation
The variability or scatter in the data may be described by the sample variance or
the sample standard deviation.
The units of measurement for the sample variance are the square of the original
units of the variable, while the standard deviation measures variability in the original
units.
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
7
Example 6.2 | Sample Variance
The table displays the quantities needed for calculating
the sample variance and sample standard deviation.
The numerator of 𝑠 2 is
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
8
𝟐
Computation of 𝒔
The prior calculation is definitional and tedious. A shortcut is derived here
and involves just 2 sums.
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
9
Example 6.3 | Shortcut Calculation for
For Example 6.2, we calculate the sample variance and standard deviation using the
shortcut method.
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
10
The meaning of 𝒏 − 𝟏 in the denominator
• The population variance is calculated with 𝑁, the population size. Why
isn’t the sample variance calculated with 𝑛, the sample size?
• The true variance is based on data deviations from the true mean, 𝜇.
• The sample calculation is based on the data deviations from 𝑥,ҧ not 𝜇.
• 𝑥ҧ is an estimator of 𝜇; close but not the same.
• So the 𝑛 − 1 divisor is used to compensate for the error in the mean
estimation.
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
11
Degrees of Freedom
• When the sample variance is calculated with the quantity 𝑛 − 1 in the
denominator, the quantity 𝑛 − 1 is called the degrees of freedom
• Origin of term:
• There are 𝑛 deviations from the 𝑥ҧ in the sample
• The sum of the deviations is zero
• 𝑛 − 1 of the observations can be freely determined but the 𝑛𝑡ℎ observation is fixed
to maintain the zero sum
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
12
Sample Range
In addition to the sample variance and sample standard deviation, the
sample range is a useful measure of variability.
For Example 6.3 (pull-off force data), the sample range is 𝑟 = 13.6 − 12.3
= 1.3.
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
13
Stem-and-Leaf Diagrams
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
14
Example 6.4a | Alloy Strength
• Consider the data in the table. We select as stem values the numbers 7, 8, 9, …, 24.
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
15
Example 6.4b | Alloy Strength
• The resulting stem-and-leaf diagram is
shown.
• Inspection of the diagram reveals that
most of the comprehensive strengths lie
between 110 and 200 psi and that a
central value is somewhere between 150
and 160 psi.
• The strengths are distributed
approximately symmetrically about the
central value
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
16
Frequency Distributions and Histograms
• A frequency distribution is a more compact summary of data than a stem
– and – leaf diagram
• To construct, we must divide the range of the data into intervals, which are
usually called class intervals, cells, or bins
• Choosing number of bins approximately equal to the square root of the
number of observations often works well in practice
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
17
Frequency Distribution Table
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
18
Histograms
• A histogram is a visual display of the frequency distribution
• Provides a visual impression of the shape and distribution of the measurements
and information about the central tendency and scatter or dispersion in the data
• Unequal bin widths will be employed bin frequency
Rectangle height =
bin width
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
19
Histograms
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
22
Time Sequence Plots
• A time series or time sequence is a data set in which the observations
are recorded in the order in which they occur
• A time series plot is a graph in which the vertical axis denotes the
observed value of variable and the horizontal axis denotes the time
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
23
Time Sequence Plots
• Combination of stem – and – leaf plot with a time series plot is a
digidot plot
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
24
Scatter Diagrams
• Multivariate: each observation consists
of measurements of several variables
• The scatter diagram is a useful way to
graphically display the potential
relationship between quality and one of
the other qualities
• When two or more variables exist, the
matrix of scatter diagrams may be
useful in looking at all of the pairwise
relationships between the variables in
the sample
• The sample correlation coefficient is a
quantitative measure of the strength of
the linear relationship between two
random variables x and y
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
25
Probability Plots
• A probability plot is a graphical method for determining whether sample
data conform to a hypothesized distribution based on a subjective visual
examination of the data
• To construct a probability plot:
• Rank the data observations in the sample from smallest to largest: x(1), x(2),…, x(n).
• The observed value x(j) is plotted against the observed cumulative frequency (j – 0.5)/n.
• The paired numbers are plotted on the probability paper of the proposed distribution.
• If the plotted points deviate a straight line, then the hypothesized
distribution adequately describes the data.
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
26
Example 6.7 | Battery Life
• The effective service life (𝑋𝑗 in
minutes) of batteries used in a
laptop are given in the table.
• We hypothesize that battery life is
adequately modeled by a normal
distribution.
• To test this hypothesis, first
arrange the observations in
ascending order and calculate
their cumulative frequencies and
plot them.
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
27
Normal Probability Plot
• Can be constructed on ordinary axes by plotting the standardized normal
scores 𝑧𝑗 against 𝑥(𝑗), where the standardized normal scores satisfy
Sec 6.7 Probability Plots Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
28
Important Terms and Concepts
• Box plot • Percentile • Sample median
• Degrees of freedom • Population mean • Sample mode
• Digidot plot • Population standard • Sample range
• Frequency distribution and deviation • Sample standard deviation
histogram • Population variance • Sample variance
• Histogram • Probability plot
• Scatter diagram
• Interquartile range • Quartiles and percentiles • Stem-and-leaf diagram
• Matrix of scatter plots • Relative frequency • Time series
• Multivariate data distribution
• Sample correlation
• Normal probability plot coefficient
• Outlier • Sample mean
• Pareto chart
Chapter 6 Important Terms and Concepts
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
29