0% found this document useful (0 votes)
363 views30 pages

CHAPTER 3 Statistical Description of Data

This document discusses various measures of central tendency and dispersion used to describe data, including: - The arithmetic mean, weighted mean, median, and mode as measures of central tendency. - The range, quantiles, mean absolute deviation, variance, and standard deviation as measures of dispersion. - Formulas are provided for calculating the population and sample means, weighted means, and examples are given. - Relationships between the mean, median, and mode are discussed for symmetrical, positively skewed, and negatively skewed distributions.

Uploaded by

Lara Flores
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
363 views30 pages

CHAPTER 3 Statistical Description of Data

This document discusses various measures of central tendency and dispersion used to describe data, including: - The arithmetic mean, weighted mean, median, and mode as measures of central tendency. - The range, quantiles, mean absolute deviation, variance, and standard deviation as measures of dispersion. - Formulas are provided for calculating the population and sample means, weighted means, and examples are given. - Relationships between the mean, median, and mode are discussed for symmetrical, positively skewed, and negatively skewed distributions.

Uploaded by

Lara Flores
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CHAPTER 2:

Statistical Description
of Data
Business Mathematics 41
Presented by: John Melvin V. Baranda
Learning Objectives
 Describe data by using measures of central tendency and dispersion

 Convert data to standardized values

 Determine measures of central tendency and dispersion for grouped data

 Use coefficient of correlation to measure association between two

quantitative variable
Statistical Description
 Measures of Central Tendency – numbers describing typical data values.

The Primary measures of central tendency are the arithmetic mean,

weighted mean, median, and mode.

 Measures of Dispersion – numbers that describe the scatter of the data. The

measures of dispersion will be discussed in this chapter are the range,

quantiles, mean absolute deviation, variance, and standard deviation.


The Arithmetic Mean
Defined as the sum of the data values divided by the number of observations, the arithmetic mean is

one of the most common measures of central tendency.

Population Mean: Sample Mean:

σ𝑁
𝑖=1 𝑥𝑖 σ 𝑥𝑖
𝜇= 𝑥ҧ =
𝑁 𝑛

Where 𝜇 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 Where 𝑥ҧ = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛

𝑥𝑖 = 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑥𝑖 = 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛

σ = 𝑠𝑢𝑚𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑟 the sum of σ = 𝑠𝑢𝑚𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑟 the sum of

𝑁 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑁 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒
Example 1 The Arithmetic Mean
As an example of the calculation of a population mean, consider the following data for
shipments of peanuts from a hypothetical U.S. exporter to five Canadian cities

City Peanuts Solution:


(Thousands of bags
) We can calculate the arithmetic mean (𝜇) for these
data as follows:
Montreal 64
σ 𝑥𝑖 64 + 15 + 285 + 228 + 45
Ottawa 15 𝜇= = = 127.4 𝑡ℎ𝑜𝑢𝑠𝑎𝑛𝑑 𝑏𝑎𝑔𝑠
𝑁 5
Toronto 285
On the average, each Canadian destination for this
Vancouver 228 U.S. company’s peanuts received 127.4 thousand ba
g of peanuts during the time period involved.
Winnipeg 45
Example 1 The Arithmetic Mean
To help understand what the arithmetic mean represents, picture a playground seesaw
with markings ranging from 0 at one end to 300 at the other end, with five people of equ
al weight sitting at positions matching the shipments received by each Canadian city
The Weighted Mean
When some values are more important than others, a weighted mean (sometimes referred to as a

weighted average) may be calculated. In this case, each data value is weighted according to its relative

importance. The formula for the weighted mean for a population or a sample will be as follows:

Population Mean: Sample Mean:

σ 𝑤𝑖 𝑥𝑖 σ 𝑤𝑖 𝑥𝑖
𝜇𝑤 = σ 𝑤𝑖
𝑥ҧ𝑤 = σ 𝑤𝑖

Where 𝑤𝑖 = 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎 Where 𝑤𝑖 = 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎

𝑥𝑖 = 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑥𝑖 = 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒
Example 2 The Weighted Mean
Continuing with the peanut example, let’s assume that shipments to the respective cities
will be sold at the following profits per thousand bags: $15.00, $13.50, $15.50, $12.00, a
nd $14.00.

City Peanuts Solution:


(Thousands of bags We can calculate the arithmetic mean (𝜇) for these data as
)
follows:
Montreal 64
σ 𝑤𝑖 𝑥𝑖
𝜇𝑤 =
Ottawa 15 𝑁
64 $15.00 + 15 $13.50 + 285($15.50) + 228($12.00) + 45($14.00)
=
Toronto 285 5
= $14.04 𝑝𝑒𝑟 𝑡ℎ𝑜𝑢𝑠𝑎𝑛𝑑 𝑏𝑎𝑔𝑠

Vancouver 228
On the average, the profit of each Canadian destination for t
his U.S. company’s peanuts received $14.04 per thousand ba
Winnipeg 45
gs of peanuts during the time period involved.
The Median
In a set of data, the median is the value that has just as many values above it as below it. For example, the numbers of

bags of peanuts (in thousands) shipped by the U.S. exporter to the five Canadian cities were:

15.0 45.0 64.0 228.0 285.0

The Median
For example, Ryder System, Inc. reported the following data for percentage return on average assets over an 8-year

period.

Raw Data: 2.8 7.0 1.6 0.4 1.9 2.6 3.8 3.8

Ordered Data: 0.4 1.6 1.9 2.6 2.8 3.8 3.8 7.0
The Mode
In a set of data, the mode is a value that occurs with the greatest frequency. As such, it could be considered the single data v

alue most typical of all the values. Consider again the 8 years of return-on-assets data reported by Ryder System, Inc.:

Ordered Data: 0.4 1.6 1.9 2.6 2.8 3.8 3.8 7.0

Mode
Comparison of the Mean, Median, and Mode
 The mean gives equal consideration to even very extreme values in the data, while the median tends to focus more closely

on those in the middle of the data array. Thus, the mean is able to make more complete use of the data. However, as point

ed out earlier, the mean can be strongly influenced by just wo very low or high values. A demonstration of how a single obs

ervation can affect the mean and median is shown in Seeing Statistics Applet 1, at the end of the chapter.

 There will be just one value for the mean and one value for the median. However, as indicated previously, the data may ha

ve more than one mode.

 The mode tends to be less useful than the mean and median as a measure of central tendency. Under certain circumstanc

es, however, the mode can be uniquely valuable. For example, when a television retailer decides how many of each screen

size to stock, it would do him little good to know that the mean television set sold has a 38.53-inch screen—after all, there

is no such thing as a 38.53-inch television. Knowledge that the mode is 30 inches would be much more useful.
Distribution Shape and Measures of Central Tendency
In a symmetrical distribution, such as that shown in part (a), the left and right sides of the distribution are mirr

or images of each other. The distribution in part (a) has a single mode, is bell shaped, and is known as the normal distribution

. It will be discussed in Chapter 6—for now, just note that the values of the mean, median, and mode are equal.

Skewness refers to the tendency of the distribution to “tail off” to the right or left, as shown in parts (b) and (c) o

f Figure 3.2. In examining the three distribution shapes shown in the figure, note the following relationships among the mean,

median, and mode:


Distribution Shape and Measures of Central Tendency
 Symmetrical distribution The mean, median, and mode are the same. This will be true for any unimodal distribution, such as this one.

(When a distribution is bimodal, it will of course be impossible for the mean, median, and both modes to be equal.)

 Positively skewed distribution The mean is greater than the median, which in turn is greater than the mode. Income distributions tend

to be positively skewed, since there is a lower limit of zero, but practically no upper limit on how much a select few might earn. In such

situations, the median will tend to be a better measure of central tendency than the mean. Statistics in Action 3.1 shows the income di

stribution for male income-earners in the United States. For each of the five age groups, the mean is greater than the median, indicati

ng that all of the distributions are positively skewed.

 Negatively skewed distribution The mean is less than the median, which in turn is less than the mode. Data that have an upper limit (e

.g., due to the maximum seating capacity of a theater or stadium) may exhibit a distribution that is negatively skewed. As with the posit

ively skewed distribution, the median is less influenced by extreme values and tends to be a better measure of central tendency than t

he mean.
Range
 The simplest measure of dispersion, the range is the difference between the highest and lowest values.

Range: 271.4 - 1.2 = 270.2


Quantiles
 Quantiles also separate the data into equal-size groups in order of numerical value. There are several kin

ds of quantiles, which are the following:

1) Percentile – divide the values into 100 parts of equal size, each comprising 1% of the observations.

The median describes the 50th percentile.

2) Decile – divide the values into 10 parts of equal size, each comprising 10% of the observations. The

median is the 5th decile.

3) Quartiles – divide the values into four parts of equal size , each comprising 25% of the

observations. The median describes the second quartile, below which 50% of the values fall.
Example 3 Percentile
Percentile can be Computed using the formula:

𝑃
𝐿𝑝 = 𝑛 + 1
100

Learn how to calculate percentile for the given example: There are 25 test scores such
as: 72,54, 56, 61, 62, 66, 68, 43, 69, 69, 70, 71,77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99,
99. Find the 60th percentile?

Ordered Data: 43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93,
95, 96, 98, 99, 99.

60
𝐿60 = 25 + 1 = 15.6
100
Interpolation:
60𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = [ 16𝑡ℎ − 15𝑡ℎ × 0.6] + 15𝑡ℎ
60𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = [ 85 − 79 × 0.6] + 79
60𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = 82.6
Example 4 Decile
Decile can be Computed using the formula:

𝑃
𝐿𝑝 = 𝑛 + 1
10

Learn how to calculate percentile for the given example: There are 25 test scores such
as: 72,54, 56, 61, 62, 66, 68, 43, 69, 69, 70, 71,77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99,
99. Find the 7th percentile?

Ordered Data: 43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93,
95, 96, 98, 99, 99.

7
𝐿7 = 25 + 1 = 18.2
10
Interpolation:
7𝑡ℎ 𝐷𝑒𝑐𝑖𝑙𝑒 = [ 19𝑡ℎ − 18𝑡ℎ × 0.2] + 18𝑡ℎ
7𝑡ℎ 𝐷𝑒𝑐𝑖𝑙𝑒 = [ 89 − 88 × 0.2] + 88
7𝑡ℎ 𝐷𝑒𝑐𝑖𝑙𝑒 = 88.2
Example 5 Quartile
Decile can be Computed using the formula:

𝑃
𝐿𝑝 = 𝑛 + 1
4

Learn how to calculate percentile for the given example: There are 25 test scores such
as: 72,54, 56, 61, 62, 66, 68, 43, 69, 69, 70, 71,77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99,
99. Find the 3th quartile?

Ordered Data: 43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93,
95, 96, 98, 99, 99.

3
𝐿7 = 25 + 1 = 19.5
4
Interpolation:
3𝑟𝑑 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 = [ 19𝑡ℎ − 18𝑡ℎ × 0.5] + 18𝑡ℎ
3𝑟𝑑 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 = [ 89 − 88 × 0.5] + 88
3𝑟𝑑 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 = 88.5
Mean Absolute Deviation
 With this descriptor, sometimes called the average deviation or the average absolute deviation, we now consider the extent

to which the data values tend to differ from the mean. In particular, the mean absolute deviation (MAD) is the average of th

e absolute values of differences from the mean and may be expressed as follows:

σ |𝑥𝑖 − 𝜇|
𝑀𝐴𝐷 =
𝑁

Where 𝜇 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛

𝑥𝑖 = 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎

𝑁 = 𝑛𝑢𝑚𝑏𝑒 𝑟𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛


Example 5 Mean Absolute Deviation
To illustrate how the mean absolute deviation is calculated, we’ll examine the following
figures, which represent annual research and development (R&D) expenditures by Micr
osoft Corporation.

R&D
Year Xi
2001 4379
2002 6299
2003 6595
2004 7779
2005 6184
Variance
 The variance, a common measure of dispersion, includes all data values and is calculated by a mathematical formula. For

a population, the variance (𝜎 2 , “sigma squared”) is the average of squared differences between the N data values and the

mean, 𝜇. For a sample variance (𝑠 2 ), the sum of the squared differences between the n data values and the mean, x , is

divided by (n - 1). These calculations can be summarized as follows:

σ(𝑥𝑖 −𝜇)2
Population: 𝜎2 =
𝑁

σ(𝑥𝑖 −𝑥)ҧ 2
Sample: 𝑠2 = 𝑛−1

Where 𝜎 2 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

𝑠 2 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

𝑥𝑖 = 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑎𝑡𝑎

𝑁 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛

n= 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒


Standard Deviation
The positive square root of the variance of either a population or a sample is a quantity known as the standard

deviation. The standard deviation is an espe cially important measure of dispersion because it is the basis for

determining the proportion of data values within certain distances on either side of the mean for certain types of

distributions (we will discuss these in a later chapter). The standard deviation may be expressed as

For a Population For a Sample

Standard Deviation 𝜎= 𝜎2 = 𝑠2
Chebyshev’s Theorem
For either a sample or a population, the percentage of observations that fall within k (for k > 1) standard deviations of the mea

n will be at least

1
1− × 100
𝑘2

Example:

The arithmetic mean biweekly amount contributed by the Dupree Paint employees to the company’s profit-sharing plan

is $51.54, and the standard deviation is $7.51. At least what percent of the contributions lie within plus 3.5 standard

deviations and minus 3.5 standard deviations of the mean?

1 1
Solution: 1 − 𝑘2 × 100 = 1 − (3.5)2 × 100 = 92

Answer: About 92% of the contributions lie within plus or minus 3.5 standard deviations.
Example 6 Chebyshev’s Theorem
The mean income of a group of sample observations is $500; the standard de
viation is $40. According to Chebyshev’s theorem, at least what percent of th
e incomes will lie between $400 and $600?

1 1
Solution: 1 − × 100 = 1 − × 100 = 84%
𝑘2 (2.5)2

Answer: At least 84% of the incomes will lie between $400 and $600.
Empirical Rule
EMPIRICAL RULE For a symmetrical, bell-shaped frequency distribution, approximately 68 percent of the observation

s will lie within plus and minus one standard deviation of the mean; about 95 percent of the observations will lie within

plus and minus two standard deviations of the mean; and practically all (99.7 percent) will lie within plus and minus th

ree standard deviations of the mean.


Empirical Rule
A sample of the rental rates at University Park Apartments approximates a symmetrical, bell-shaped distribution. The sample

mean is $500; the standard deviation is $20. Using the Empirical Rule, answer these questions:

1. About 68 percent of the monthly rentals are between what two amounts?

2. About 95 percent of the monthly rentals are between what two amounts?

3. Almost all of the monthly rentals are between what two amounts?

Solution:

1. About 68% are between $480 and $520, found by 𝑋ത ± 1𝑠 = $500 ± 1 $20

2. About 95% are between $460 and $540, found by 𝑋ത ± 2𝑠 = $500 ± 2($20)

3. Almost all (99.7%) are between $440 and $560, found by 𝑋ത ± 3𝑠 = $500 ± 3($20)
Example 7 Empirical Rule
A sample of the rental rates at University Park Apartments approximates a symmetrical, bell-shaped dis

tribution. The sample mean is $500; the standard deviation is $20. Using the Empirical Rule, answer the

se questions:

1. About 68 percent of the monthly rentals are between what two amounts?

2. About 95 percent of the monthly rentals are between what two amounts?

3. Almost all of the monthly rentals are between what two amounts?

Solution:

1. About 68% are between $480 and $520, found by 𝑋ത ± 1𝑠 = $500 ± 1 $20

2. About 95% are between $460 and $540, found by 𝑋ത ± 2𝑠 = $500 ± 2($20)

3. Almost all (99.7%) are between $440 and $560, found by 𝑋ത ± 3𝑠 = $500 ± 3($20)
Example 8 Empirical Rule
The distribution of the weights of a sample of 1,400 cargo containers is symmetric and bell-shaped. Ac

cording to the Empirical Rule, what percent of the weights will lie:

a. Between 𝑋ത − 2𝑠 and 𝑋ത + 2𝑠?

b. Between 𝑋ത and 𝑋ത + 2𝑠? Below 𝑋ത − 2𝑠?

Solution:

a. 95%

b. 47.5%, 15%
Grouped Data
The frequency distribution, also referred to as grouped data, is a convenient summary of raw data, but it loses some of the inf

ormation originally contained in the data. As a result, measures of central tendency and dispersion determined from the frequ

ency distribution will be only approximations of the actual values.

σ 𝑓𝑖 𝑚𝑖
Approximate mean: 𝑥ҧ = 𝑛

σ 𝑓𝑖 𝑚2 𝑖 −𝑛𝑥ҧ 2
Approximate Variance: 𝑠2 = 𝑛−1

Approximate Standard Deviation: 𝑠 = 𝑠2


Example 9 Grouped Data
Consider the following data: Solution:

σ 𝑓𝑖 𝑚𝑖 1980
𝑥ҧ = = = 39.6
𝑛 50

σ 𝑓𝑖 𝑚2 𝑖 − 𝑛𝑥ҧ 2 82,512.50 − 50(39.6)2


𝑠2 = = = 83.77
𝑛−1 50 − 1

𝑠 = 83.77 = 9.15

Find the mean, variance, and standard

deviation

You might also like