0% found this document useful (0 votes)
17 views67 pages

Chapter03

The document covers descriptive statistics, focusing on measures of location (mean, median, mode, percentiles), measures of variability (range, variance, standard deviation), and measures of distribution shape. It explains how to calculate these statistics using examples, particularly in the context of apartment rents and construction wages. Additionally, it highlights the importance of understanding both central tendency and variability in data analysis.

Uploaded by

Serdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views67 pages

Chapter03

The document covers descriptive statistics, focusing on measures of location (mean, median, mode, percentiles), measures of variability (range, variance, standard deviation), and measures of distribution shape. It explains how to calculate these statistics using examples, particularly in the context of apartment rents and construction wages. Additionally, it highlights the importance of understanding both central tendency and variability in data analysis.

Uploaded by

Serdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

Descriptive Statistics: Numeric Measures

Chapter 3

Slide
1
Overview

Measures of Location
• Mean, Median, Mode, and Percentiles

Measures of Variability
• Range, Variance, Standard Deviation and
Coefficient of Variation

Measures of Distribution Shape,


Relative Location, and Outliers
Five Number Summaries and
Box Plot

Slide
2
Measures of Location

Slide
3
Measures of Location

 Mean
 Weighted Mean
If the measures are computed
 Median for data from a sample,
 Geometric they are called sample statistics.
 Mean
Mode
If the measures are computed
 Percentiles for data from a population,
 Quartiles they are called population parameters.

A sample statistic is referred to


as the point estimator of the
corresponding population parameter.

Slide
4
Mean

 Perhaps the most important measure of


location is the mean.
 The mean provides a measure of central
 location.
The mean of a data set is the average of all
the data values.
 The sample mean x is the point estimator of
the population mean m.

Slide
5
Sample Mean x

Sum of the values


of the n observations

x i
x
n

Number of
observations
in the sample

Slide
6
Population Mean m

Sum of the values


of the N observations

x i

N

Number of
observations in
the population

Slide
7
Sample Mean

 Example: Apartment Rents


Seventy efficiency apartments were
randomly
sampled in a small college town. The
monthly rent
445 615 430 590 435 600 460 600 440 615
prices for these apartments are listed below.
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440

Slide
8
Sample Mean

 Example: Apartment Rents

x
 x
i34,356
 490.80
n 70
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440

Slide
9
Weighted Mean

 In some instances the mean is computed by


giving each observation a weight that reflects
its relative importance.
 The choice of weights depends on the
 application.
The weights might be the number of credit
hours earned for each grade, as in GPA.
 In other weighted mean computations,
quantities such as pounds, dollars, or volume
are frequently used.

Slide
10
Weighted Mean
If data is from
a population,
m replaces x. Numerator:
sum of the weighted
data values

x
 wx i i

w i
Denominator:
sum of the
where: weights

xi = value of observation i
wi = weight for observation i

Slide
11
Weighted Mean

 Example: Construction Wages


Ron Butler, a home builder, is looking over
the expenses he incurred for a house he just
built. For the purpose of pricing future projects,
he would like to know the average wage
($/hour) he paid the workers he employed.
Listed below are the categories of worker he
employed, along with their respective wage and
total hours Worker
worked.Wage ($/hr) Total Hours
Carpenter 21.60 520
Electrician 28.72 230
Laborer 11.80 410
Painter 19.75 270
Plumber 24.16 160

Slide
12
Weighted Mean

 Example: Construction Wages

Worker xi wi wi x i
Carpenter 21.60 520 11232.0
Electrician 28.72 230 6605.6
Laborer 11.80 410 4838.0
Painter 19.75 270 5332.5
Plumber 24.16 160 3865.6
1590 31873.7

  wx i i

31873.7
20.0464 $20.05
w i 1590

FYI, equally-weighted (simple) mean =


$21.21
Slide
13
Median

 The median of a data set is the value in the middle


when the data items are arranged in ascending ord
 Whenever a data set has extreme values, the media
is the preferred measure of central location.
 The median is the measure of location most often
reported for annual income and property value data
 A few extremely large incomes or property values
can inflate the mean.

Slide
14
Median

 For an odd number of observations:

26 18 27 12 14 27 19 7 observations

12 14 18 19 26 27 27 in ascending order

the median is the middle value.

Median = 19

Slide
15
Median

 For an even number of observations:

26 18 27 12 14 27 30 19 8 observations

12 14 18 19 26 27 27 30 in ascending order

the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5

Slide
16
Median

 Example: Apartment Rents


Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending


order.

Slide
17
Mode

 The mode of a data set is the value that occurs with


greatest frequency.
 The greatest frequency can occur at two or more
different values.
 If the data have exactly two modes, the data are
bimodal.
 If the data have more than two modes, the data are
multimodal.
 Caution: If the data are bimodal or multimodal,
Excel’s MODE function will incorrectly identify a
single mode.

Slide
18
Mode

 Example: Apartment Rents


450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending


order.

Slide
19
Percentiles

 A percentile provides information about how the


data are spread over the interval from the smallest
value to the largest value.
 Admission test scores for colleges and universities
are frequently reported in terms of percentiles.
 The pth percentile of a data set is a value such
that at least p percent of the items take on this
value or less and at least (100 - p) percent of the
items take on this value or more.

Slide
20
Percentiles

Arrange the data in ascending order.

Compute index i, the position of the pth percentile.


i = (p/100)n

If i is not an integer, round up. The p th percentile


is the value in the i th position.

If i is an integer, the p th percentile is the average


of the values in positions i and i +1.

Slide
21
80th Percentile

 Example: Apartment Rents


i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending


order.
Slide
22
80th Percentile

 Example: Apartment Rents


“At least 80% of the “At least 20% of the
items take on a items take on a
value of 542 or less.” value of 542 or more.”
56/70 = .8 or 80% 14/70 = .2 or 20%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Slide
23
Quartiles

 Quartiles are specific percentiles.


 First Quartile = 25th Percentile
 Second Quartile = 50th Percentile = Median
 Third Quartile = 75th Percentile

Slide
24
Third Quartile

 Example: Apartment Rents


Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending


order.
Slide
25
Measure of Variability

Range, Interquartile Range, Variance,


Standard Deviation, and Coefficient of
Variation

Slide
26
Measures of Variability

 It is often desirable to consider measures of variabil


(dispersion), as well as measures of location.
 For example, in choosing supplier A or supplier B we
might consider not only the average delivery time f
each, but also the variability in delivery time for ea

Slide
27
Range

 The range of a data set is the difference between th


largest and smallest data values.
 It is the simplest measure of variability.
 It is very sensitive to the smallest and largest data
values.

Slide
28
Range

 Example: Apartment Rents


Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending


order.

Slide
29
Interquartile Range

 The interquartile range of a data set is the differenc


between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.
 It overcomes the sensitivity to extreme data values

Slide
30
Interquartile Range

 Example: Apartment Rents


3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending


order.
Slide
31
Variance

The variance is a measure of variability that utilizes


all the data.

It is based on the difference between the value of


x ( for a sample
each observation (xi) and the mean
m for a population).

The variance is useful in comparing the variability


of two or more variables.

Slide
32
Variance

The variance is the average of the squared


differences between each data value and the mean.

The variance is computed as follows:


2 2
 ( xi  x ) 2  ( xi   )
s2   
n 1 N

for a for a
sample population

Slide
33
Standard Deviation

The standard deviation of a data set is the positive


square root of the variance.

It is measured in the same units as the data, making


it more easily interpreted than the variance.

Slide
34
Standard Deviation

The standard deviation is computed as follows:

s  s2   2

for a for a
sample population

Slide
35
Coefficient of Variation

The coefficient of variation indicates how large the


standard deviation is in relation to the mean.

The coefficient of variation is computed as follows:


s   
 100  %  100  %
x   
for a for a
sample population

Slide
36
Sample Variance, Standard Deviation,
And Coefficient of Variation
 Example: Apartment Rents

x
 x
i34,356
 490.80
n 70
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Slide
37
Sample Variance, Standard Deviation,
And Coefficient of Variation
 Example: Apartment Rents

• Variance
s2   i
2
(x  x )
 2,996.16
n 1

• Standard Deviation the


s  s2  2996.16  54.74 standard
deviation is
about 11%
• Coefficient of Variation of the
 s   54.74  mean
 100  %   100  %  11.15%
x   490.80 

Slide
38
Measures of Distribution Shape,
Relative Location, and Detecting Outliers

Distribution Shape, z-Scores,


Chebyshev’s Theorem, Empirical Rule,
Detecting Outliers

Slide
39
Distribution Shape: Skewness
 An important measure of the shape of a
distribution is called skewness.
 The formula for the skewness of sample data is
3
n  xi  x 
Skewness   
(n  1)(n  2)  s 

 Skewness can be easily computed using


statistical software.

Slide
40
Distribution Shape: Skewness
 Symmetric (not skewed)
• Skewness is zero.
• Mean and median are equal.

.35
Skewness =
0
Relative Frequency

.30
.25
.20
.15
.10
.05
0

Slide
41
Distribution Shape: Skewness
 Moderately Skewed Left
• Skewness is negative.
• Mean will usually be less than the median.
.35
Skewness = - .31
Relative Frequency

.30
.25
.20
.15
.10
.05
0

Slide
42
Distribution Shape: Skewness
 Moderately Skewed Right
• Skewness is positive.
• Mean will usually be more than the median.
.35
Skewness = .31
Relative Frequency

.30
.25
.20
.15
.10
.05
0

Slide
43
Distribution Shape: Skewness
 Highly Skewed Right
• Skewness is positive (often above 1.0).
• Mean will usually be more than the median.

.35
Skewness = 1.25
Relative Frequency

.30
.25
.20
.15
.10
.05
0

Slide
44
Distribution Shape: Skewness
 Example: Apartment Rents
Seventy efficiency apartments were
randomly
sampled in a college town. The monthly rent
prices
425 430 430 435 435 435 435 435 440 440
for the apartments are listed below in ascending
440 440 440 445 445 445 445 445 450 450
order.
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Slide
45
Distribution Shape: Skewness
 Example: Apartment Rents

.35 Skewness = .92


Relative Frequency

.30

.25

.20
.15

.10
.05
0

Slide
46
z-Scores

The z-score is often called the standardized value.

It denotes the number of standard deviations a data


value xi is from the mean.

xi  x
zi 
s

Excel’s STANDARDIZE function can be used to


compute the z-score.

Slide
47
z-Scores

 An observation’s z-score is a measure of the relative


location of the observation in a data set.
 A data value less than the sample mean will have a
z-score less than zero.
 A data value greater than the sample mean will hav
a z-score greater than zero.
 A data value equal to the sample mean will have a
z-score of zero.

Slide
48
z-Scores

 Example: Apartment Rents

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Slide
49
z-Scores

 Example: Apartment Rents


• z-Score of Smallest Value (425)
xi  x 425  490.80
z    1.20
s 54.74

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

Slide
50
Chebyshev’s Theorem

At least (1 - 1/z2) of the items in any data set will be


within z standard deviations of the mean, where z is
any value greater than 1.

Chebyshev’s theorem requires z > 1, but z need not


be an integer.

Slide
51
Chebyshev’s Theorem

At least 75% of the data values must be


within z = 2 standard deviations of the mean.

At least 89% of the data values must be


within z = 3 standard deviations of the mean.

At least 94% of the data values must be


within z = 4 standard deviations of the mean.

Slide
52
Chebyshev’s Theorem

 Example: Apartment Rents


Let z = 1.5 with x= 490.80 and s = 54.74

At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%


of the rent values must be between
x - z(s) = 490.80 - 1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573

(Actually, 86% of the rent values


are between 409 and 573.)

Slide
53
Empirical Rule

When the data are believed to approximate a


bell-shaped distribution …

The empirical rule can be used to determine the


percentage of data values that must be within a
specified number of standard deviations of the
mean.

The empirical rule is based on the normal


distribution, which is covered in Chapter 6.

Slide
54
Empirical Rule

For data having a bell-shaped distribution:

68.26% of the values of a normal random variable


are within+/- 1 standard deviation of its mea

95.44% of the values of a normal random variable


are within +/- 2 standard deviations of its mea

99.72% of the values of a normal random variable


are within +/- 3 standard deviations of its mea

Slide
55
Empirical Rule

99.72%
95.44%
68.26%

m
x
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s

Slide
56
Detecting Outliers

 An outlier is an unusually small or unusually large


value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
 It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the
data set
• a correctly recorded data value that belongs in
the data set

Slide
57
Detecting Outliers

 Example: Apartment Rents


• The most extreme z-scores are -1.20 and 2.27
• Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

Slide
58
Box Plot

Slide
59
Five-Number Summaries
and Box Plots

Summary statistics and easy-to-draw graphs can be


used to quickly summarize large quantities of data.

Two tools that accomplish this are five-number


summaries and box plots.

Slide
60
Five-Number Summary

1 Smallest Value

2 First Quartile

3 Median

4 Third Quartile

5 Largest Value

Slide
61
Five-Number Summary

 Example: Apartment Rents

Lowest Value = 425 First Quartile = 445


Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Slide
62
Box Plot

A box plot is a graphical summary of data that is


based on a five-number summary.

A key to the development of a box plot is the


computation of the median and the quartiles Q1 and
Q3.

Box plots provide another way to identify outliers.

Slide
63
Box Plot
 Example: Apartment Rents
• A box is drawn with its ends located at the first an
third quartiles.
• A vertical line is drawn in the box at the location of
the median (second quartile).

40 42 45 47 50 52 55 57 60 62
0 5 0 5 0 5 0 5 0 5
Q1 = 445 Q3 = 525
Q2 = 475
Slide
64
Box Plot

 Limits are located (not drawn) using the


interquartile range (IQR).
 Data outside these limits are considered
 outliers.
The locations of each outlier is shown with the
symbol * .
continued

Slide
65
Box Plot
 Example: Apartment Rents
• The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

• The upper limit is located 1.5(IQR) above Q3.


Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

• There are no outliers (values less than 325 or


greater than 645) in the apartment rent data.

Slide
66
Box Plot
 Example: Apartment Rents
• Whiskers (dashed lines) are drawn from the
ends
of the box to the smallest and largest data
values
inside the limits.

40 42 45 47 50 52 55 57 60 62
0 5 0 5 0 5 0 5 0 5
Smallest value Largest value
inside limits = 425 inside limits = 615
Slide
67

You might also like