CHAPTER 3 Statistical Description of Data
CHAPTER 3 Statistical Description of Data
Statistical Description
of Data
Business Mathematics 41
Presented by: John Melvin V. Baranda
Learning Objectives
Describe data by using measures of central tendency and dispersion
quantitative variable
Statistical Description
Measures of Central Tendency – numbers describing typical data values.
Measures of Dispersion – numbers that describe the scatter of the data. The
σ𝑁
𝑖=1 𝑥𝑖 σ 𝑥𝑖
𝜇= 𝑥ҧ =
𝑁 𝑛
𝑥𝑖 = 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑥𝑖 = 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑁 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑁 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒
Example 1 The Arithmetic Mean
As an example of the calculation of a population mean, consider the following data for
shipments of peanuts from a hypothetical U.S. exporter to five Canadian cities
weighted average) may be calculated. In this case, each data value is weighted according to its relative
importance. The formula for the weighted mean for a population or a sample will be as follows:
σ 𝑤𝑖 𝑥𝑖 σ 𝑤𝑖 𝑥𝑖
𝜇𝑤 = σ 𝑤𝑖
𝑥ҧ𝑤 = σ 𝑤𝑖
Where 𝑤𝑖 = 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎 Where 𝑤𝑖 = 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎
𝑥𝑖 = 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑥𝑖 = 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒
Example 2 The Weighted Mean
Continuing with the peanut example, let’s assume that shipments to the respective cities
will be sold at the following profits per thousand bags: $15.00, $13.50, $15.50, $12.00, a
nd $14.00.
Vancouver 228
On the average, the profit of each Canadian destination for t
his U.S. company’s peanuts received $14.04 per thousand ba
Winnipeg 45
gs of peanuts during the time period involved.
The Median
In a set of data, the median is the value that has just as many values above it as below it. For example, the numbers of
bags of peanuts (in thousands) shipped by the U.S. exporter to the five Canadian cities were:
The Median
For example, Ryder System, Inc. reported the following data for percentage return on average assets over an 8-year
period.
Raw Data: 2.8 7.0 1.6 0.4 1.9 2.6 3.8 3.8
Ordered Data: 0.4 1.6 1.9 2.6 2.8 3.8 3.8 7.0
The Mode
In a set of data, the mode is a value that occurs with the greatest frequency. As such, it could be considered the single data v
alue most typical of all the values. Consider again the 8 years of return-on-assets data reported by Ryder System, Inc.:
Ordered Data: 0.4 1.6 1.9 2.6 2.8 3.8 3.8 7.0
Mode
Comparison of the Mean, Median, and Mode
The mean gives equal consideration to even very extreme values in the data, while the median tends to focus more closely
on those in the middle of the data array. Thus, the mean is able to make more complete use of the data. However, as point
ed out earlier, the mean can be strongly influenced by just wo very low or high values. A demonstration of how a single obs
ervation can affect the mean and median is shown in Seeing Statistics Applet 1, at the end of the chapter.
There will be just one value for the mean and one value for the median. However, as indicated previously, the data may ha
The mode tends to be less useful than the mean and median as a measure of central tendency. Under certain circumstanc
es, however, the mode can be uniquely valuable. For example, when a television retailer decides how many of each screen
size to stock, it would do him little good to know that the mean television set sold has a 38.53-inch screen—after all, there
is no such thing as a 38.53-inch television. Knowledge that the mode is 30 inches would be much more useful.
Distribution Shape and Measures of Central Tendency
In a symmetrical distribution, such as that shown in part (a), the left and right sides of the distribution are mirr
or images of each other. The distribution in part (a) has a single mode, is bell shaped, and is known as the normal distribution
. It will be discussed in Chapter 6—for now, just note that the values of the mean, median, and mode are equal.
Skewness refers to the tendency of the distribution to “tail off” to the right or left, as shown in parts (b) and (c) o
f Figure 3.2. In examining the three distribution shapes shown in the figure, note the following relationships among the mean,
(When a distribution is bimodal, it will of course be impossible for the mean, median, and both modes to be equal.)
Positively skewed distribution The mean is greater than the median, which in turn is greater than the mode. Income distributions tend
to be positively skewed, since there is a lower limit of zero, but practically no upper limit on how much a select few might earn. In such
situations, the median will tend to be a better measure of central tendency than the mean. Statistics in Action 3.1 shows the income di
stribution for male income-earners in the United States. For each of the five age groups, the mean is greater than the median, indicati
Negatively skewed distribution The mean is less than the median, which in turn is less than the mode. Data that have an upper limit (e
.g., due to the maximum seating capacity of a theater or stadium) may exhibit a distribution that is negatively skewed. As with the posit
ively skewed distribution, the median is less influenced by extreme values and tends to be a better measure of central tendency than t
he mean.
Range
The simplest measure of dispersion, the range is the difference between the highest and lowest values.
1) Percentile – divide the values into 100 parts of equal size, each comprising 1% of the observations.
2) Decile – divide the values into 10 parts of equal size, each comprising 10% of the observations. The
3) Quartiles – divide the values into four parts of equal size , each comprising 25% of the
observations. The median describes the second quartile, below which 50% of the values fall.
Example 3 Percentile
Percentile can be Computed using the formula:
𝑃
𝐿𝑝 = 𝑛 + 1
100
Learn how to calculate percentile for the given example: There are 25 test scores such
as: 72,54, 56, 61, 62, 66, 68, 43, 69, 69, 70, 71,77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99,
99. Find the 60th percentile?
Ordered Data: 43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93,
95, 96, 98, 99, 99.
60
𝐿60 = 25 + 1 = 15.6
100
Interpolation:
60𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = [ 16𝑡ℎ − 15𝑡ℎ × 0.6] + 15𝑡ℎ
60𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = [ 85 − 79 × 0.6] + 79
60𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = 82.6
Example 4 Decile
Decile can be Computed using the formula:
𝑃
𝐿𝑝 = 𝑛 + 1
10
Learn how to calculate percentile for the given example: There are 25 test scores such
as: 72,54, 56, 61, 62, 66, 68, 43, 69, 69, 70, 71,77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99,
99. Find the 7th percentile?
Ordered Data: 43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93,
95, 96, 98, 99, 99.
7
𝐿7 = 25 + 1 = 18.2
10
Interpolation:
7𝑡ℎ 𝐷𝑒𝑐𝑖𝑙𝑒 = [ 19𝑡ℎ − 18𝑡ℎ × 0.2] + 18𝑡ℎ
7𝑡ℎ 𝐷𝑒𝑐𝑖𝑙𝑒 = [ 89 − 88 × 0.2] + 88
7𝑡ℎ 𝐷𝑒𝑐𝑖𝑙𝑒 = 88.2
Example 5 Quartile
Decile can be Computed using the formula:
𝑃
𝐿𝑝 = 𝑛 + 1
4
Learn how to calculate percentile for the given example: There are 25 test scores such
as: 72,54, 56, 61, 62, 66, 68, 43, 69, 69, 70, 71,77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99,
99. Find the 3th quartile?
Ordered Data: 43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93,
95, 96, 98, 99, 99.
3
𝐿7 = 25 + 1 = 19.5
4
Interpolation:
3𝑟𝑑 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 = [ 19𝑡ℎ − 18𝑡ℎ × 0.5] + 18𝑡ℎ
3𝑟𝑑 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 = [ 89 − 88 × 0.5] + 88
3𝑟𝑑 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 = 88.5
Mean Absolute Deviation
With this descriptor, sometimes called the average deviation or the average absolute deviation, we now consider the extent
to which the data values tend to differ from the mean. In particular, the mean absolute deviation (MAD) is the average of th
e absolute values of differences from the mean and may be expressed as follows:
σ |𝑥𝑖 − 𝜇|
𝑀𝐴𝐷 =
𝑁
R&D
Year Xi
2001 4379
2002 6299
2003 6595
2004 7779
2005 6184
Variance
The variance, a common measure of dispersion, includes all data values and is calculated by a mathematical formula. For
a population, the variance (𝜎 2 , “sigma squared”) is the average of squared differences between the N data values and the
mean, 𝜇. For a sample variance (𝑠 2 ), the sum of the squared differences between the n data values and the mean, x , is
σ(𝑥𝑖 −𝜇)2
Population: 𝜎2 =
𝑁
σ(𝑥𝑖 −𝑥)ҧ 2
Sample: 𝑠2 = 𝑛−1
𝑠 2 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
deviation. The standard deviation is an espe cially important measure of dispersion because it is the basis for
determining the proportion of data values within certain distances on either side of the mean for certain types of
distributions (we will discuss these in a later chapter). The standard deviation may be expressed as
Standard Deviation 𝜎= 𝜎2 = 𝑠2
Chebyshev’s Theorem
For either a sample or a population, the percentage of observations that fall within k (for k > 1) standard deviations of the mea
n will be at least
1
1− × 100
𝑘2
Example:
The arithmetic mean biweekly amount contributed by the Dupree Paint employees to the company’s profit-sharing plan
is $51.54, and the standard deviation is $7.51. At least what percent of the contributions lie within plus 3.5 standard
1 1
Solution: 1 − 𝑘2 × 100 = 1 − (3.5)2 × 100 = 92
Answer: About 92% of the contributions lie within plus or minus 3.5 standard deviations.
Example 6 Chebyshev’s Theorem
The mean income of a group of sample observations is $500; the standard de
viation is $40. According to Chebyshev’s theorem, at least what percent of th
e incomes will lie between $400 and $600?
1 1
Solution: 1 − × 100 = 1 − × 100 = 84%
𝑘2 (2.5)2
Answer: At least 84% of the incomes will lie between $400 and $600.
Empirical Rule
EMPIRICAL RULE For a symmetrical, bell-shaped frequency distribution, approximately 68 percent of the observation
s will lie within plus and minus one standard deviation of the mean; about 95 percent of the observations will lie within
plus and minus two standard deviations of the mean; and practically all (99.7 percent) will lie within plus and minus th
mean is $500; the standard deviation is $20. Using the Empirical Rule, answer these questions:
1. About 68 percent of the monthly rentals are between what two amounts?
2. About 95 percent of the monthly rentals are between what two amounts?
3. Almost all of the monthly rentals are between what two amounts?
Solution:
1. About 68% are between $480 and $520, found by 𝑋ത ± 1𝑠 = $500 ± 1 $20
2. About 95% are between $460 and $540, found by 𝑋ത ± 2𝑠 = $500 ± 2($20)
3. Almost all (99.7%) are between $440 and $560, found by 𝑋ത ± 3𝑠 = $500 ± 3($20)
Example 7 Empirical Rule
A sample of the rental rates at University Park Apartments approximates a symmetrical, bell-shaped dis
tribution. The sample mean is $500; the standard deviation is $20. Using the Empirical Rule, answer the
se questions:
1. About 68 percent of the monthly rentals are between what two amounts?
2. About 95 percent of the monthly rentals are between what two amounts?
3. Almost all of the monthly rentals are between what two amounts?
Solution:
1. About 68% are between $480 and $520, found by 𝑋ത ± 1𝑠 = $500 ± 1 $20
2. About 95% are between $460 and $540, found by 𝑋ത ± 2𝑠 = $500 ± 2($20)
3. Almost all (99.7%) are between $440 and $560, found by 𝑋ത ± 3𝑠 = $500 ± 3($20)
Example 8 Empirical Rule
The distribution of the weights of a sample of 1,400 cargo containers is symmetric and bell-shaped. Ac
cording to the Empirical Rule, what percent of the weights will lie:
Solution:
a. 95%
b. 47.5%, 15%
Grouped Data
The frequency distribution, also referred to as grouped data, is a convenient summary of raw data, but it loses some of the inf
ormation originally contained in the data. As a result, measures of central tendency and dispersion determined from the frequ
σ 𝑓𝑖 𝑚𝑖
Approximate mean: 𝑥ҧ = 𝑛
σ 𝑓𝑖 𝑚2 𝑖 −𝑛𝑥ҧ 2
Approximate Variance: 𝑠2 = 𝑛−1
σ 𝑓𝑖 𝑚𝑖 1980
𝑥ҧ = = = 39.6
𝑛 50
𝑠 = 83.77 = 9.15
deviation