0% found this document useful (0 votes)
24 views

Measures of Central Tendency and Spread: Chapter 1, Section 2

Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Measures of Central Tendency and Spread: Chapter 1, Section 2

Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

Measures of Central Tendency

and Spread
Chapter 1, Section 2
Measures of Central Tendency
90

80

70

60

50

40

30

20

10

0
3 1 3 4 9 6 4 1 2 9 7 4
071 875 037 199 384 763 142 521 290 266 047 426
7 1 3 4 6 4 3 1 4 6 1 9
-2
.6 .96 .25 .54 164 873 582 291 987 . 69 405 113
-1 -1 -0 0. 0. 1. 2. 2. 3 4. 5.

Identical shapes, the black is centered to the right of the red.


The Motivation
• Measure of central tendency are used to
describe the typical member of a
population.
• Depending on the type of data, typical
could have a variety of “best” meanings.
• We will discuss four of these possible
choices.
4 Measures of Central Tendency
• Mean – the arithmetic average. This is used for continuous
data.
• Median – a value that splits the data into two halves, that
is, one half of the data is smaller than that number, the
other half larger. May be used for continuous or ordinal
data.
• Mode – this is the category that has the most data. As the
description implies it is used for categorical data.
• Midrange – not used as often as the other three, it is found
by taking the average of the lowest and highest number in
the data set. Also primarily used for continuous data.
Mean
• To find the mean, add all
of the values, then divide
by the number of values. 

x
Population
• The lower case, Greek N
letter mu is used for
population mean. 
x
x
Sample
• An “x” with a bar over n
it, read x-bar, is used for
sample mean.
Mean Example
listing X
1 14
2 17
3 31 x-bar
4 28 737/15 = 49.13333
5 42
6 43
7 51
8 51
9 66
10 70
11 67
12 70
13 78
14 62
n = 15 47
total 737
Median
• The median is a number chosen so that half of the
values in the data set are smaller than that number,
and the other half are larger.
• To find the median
– List the numbers in ascending order
– If there is a number in the middle (odd number of
values) that is the median
– If there is not a middle number (even number of values)
take the two in the middle, their average is the median
Median Example
listing X listing X
1 14 1 14
2 17 2 17
3 28 3 28
4 31 4 31
5 42 5 42
6 43 6 43
7 47 7 47
8 51 8 51 51+53
= 52
9 51 9 53 2
10 62 10 57
11 66 11 62
12 67 12 66
13 70 13 67
14 70 14 70
15 78 15 70
16 78
Mode
• The mode is simply the category or value which
occurs the most in a data set.
• If a category has radically more than the others, it
is a mode.
• Generally speaking we do not consider more than
two modes in a data set.
• No clear guideline exists for deciding how many
more entries a category must have than the others
to constitute a mode.
Obvious Example
Beach Ball Production
• There is
80
obviously more 70

yellow than red 60

or blue. 50

thousands
• Yellow is the 40

mode. 30

20
• The mode is the 10

class, not the 0

frequency. blue red yellow


Bimodal
Geometry Scores For TASP

120

100

80

60

40

20

0
very bad bad neutral good very good
No Mode
Category Frequency
1 51 70
2 51 60

3 66 50

4 62 40

5 65 30

6 57 20

10
7 47
0
8 43 1 2 3 4 5 6 7 8 9

9 64
• Although the third category is the
largest, it is not sufficiently
different to be called the mode.
Midrange
• The midrange is the average of the lowest
and highest value in the data set.
• This measure is not often used since it is
based strictly on the two extreme values in
the data.
Midrange Example
X
min 14
17
28
31
42 14 + 78
midrange = = 46
43 2
47
51
51
62
66
67
70
70
max 78
0
20
40
60
80
100
120
140
160
180
200
-6.33939635
-5.447617432
-4.555838513
-3.664059595
-2.772280676
-1.880501757
-0.988722839
-0.09694392
0.794834998
1.686613917
2.578392835
3.470171754
4.361950672
5.253729591
Same mean, but y varies more than x.
6.145508509
Measures of Variation

7.037287428
y
x
Three Measures of Variation
• While there are other measures, we will look at
only three:
– Variance
– Standard deviation
– Coefficient of variation
• Population mean and sample mean use an
identical formula for calculation.
• There is a minor difference in the formulas for
variation.
Population Variance
• The population variance, σ2, is
found using either of the
formulas to the right.
• The differences are squared to  2

 (x  ) 2

prevent the sum from being N


zero for all cases.
• N is the size of the population, 2   x 2

 2
μ is the population mean. N
• Note that variance is always
positive if x can take on more
than one value.
Population Standard Deviation
• The standard deviation can be thought of as
the average amount we could expect the x’s
in the population to differ from the mean
value of the population.
• To get the standard deviation, simply take
the square root of the variance.
Sample Variance
• The sample variance, s2, is
found using either of the
formulas to the right.
• The differences are squared to s  2  ( x  x ) 2

prevent the sum from being n 1


x   x
zero for all cases. 2
• The sample size is n, x-bar is
s 2

 
2

the sample mean. n 1 n(n  1)


• Note that n-1 is used rather than
n. This adjustment prevents bias
in the estimate.
Sample Standard Deviation
• Just like the standard deviation of a
population, to find the standard deviation of
a sample, take the square root of the sample
variance.
Coefficient of Variation
• The measures discussed so far are primarily
useful when comparing members from the
same population, or comparing similar
populations.
• When looking at two or more dissimilar
populations, it doesn’t make any more sense
to compare standard deviations than it does
to compare means.
Coefficient of Variation Cont.
• Example 1: Weight loss
programs A and B. A B
• Two different programs Mean 20 25
with the same goal and
(weight
target population.
loss per
• While program B averages
more weight loss, it also
month)
has less consistent results. Standard 15 30
deviation
Coefficient of Variation Cont.
• Example 2: Weight loss
program A and tax refund B. A B
• Two different programs with Mean 20 650
different goals and different
target populations.
• We know that average Standard 15 30
weight loss and average tax deviation
refund are not comparable.
Are the standard deviations
comparable?
Coefficient of Variation Cont.
• In the last example we can see an argument that
standard deviation does not give the complete picture.
• The coefficient of variation addresses this issue by
establishing a ratio of the standard deviation to the
mean. This ratio is expressed as a percentage.

100s 100
CV  (sample) or CV  (population)
x 
Coefficient of Variation Cont.
• Looking at the two
examples. We see that in A B
both cases the standard
deviation for B is twice CV 75% 120%
that of A. Example 1
• In the first example we
have almost twice the
relative variation in B.
CV 75% 4.6%
• In the second example, we Example 2
have a little over 16 times
as much variation in A.
Measures of Position

The dot on the left is at about -1, the dot on the right is at
approximately 0.8. But where are they relative to the rest
of the values in this distribution.
Quartiles, Percentiles and Other
Fractiles
• We will only consider the quartile, but the same
concept is often extended to percentages or other
fractions.
• The median is a good starting point for finding the
quartiles.
• Recall that to find the median, we wanted to locate
a point so that half of the data was smaller, and the
other half larger than that point.
Quartile
• For quartiles, we want to divide our data
into 4 equal pieces.

Suppose we had the following data set (already in order)

2 3 7 8 8 8 9 13 17 20 21 21

Choosing the numbers 7.5, 8.5, and 18.5 as markers would


Divide the data into 4 groups, each with three elements.
These numbers would be the three quartiles for this data set.
Quartiles Continued
• Conceptually, this is easy, simply find the median, then
treat the left hand side as if it were a data set, and find its
median; then do the same to the right hand side.
• This is not always simple. Consider the following data set.
• 3333356888889
• The first difficulty is that the data set does not divide
nicely.
• Using the rules for finding a median, we would get
quartiles of 3, 6 and 8.
• The second difficulty is how many of the 3’s are in the
first quartile, and how many in the second?
Quartiles Continued
• For this course, let’s pretend that this is not
an issue.
• I will give you the quartiles.
• I will not ask how many are in a quartile.
5 Number Summary
• The five number summary is the minimum value, the three quartiles and
the maximum value.
• This may be represented graphically with a box and whisker plot.
Outliers
• Outliers are values in the data set which are either
suspiciously large or small.
• Such values may be the result of an error, the
researcher measures incorrectly or maybe the
results are typed incorrectly.
• Outliers may be good data. There is always the
chance that you have one basketball player in a set
of ordinary people.
• The seven foot height is not an error, but it is still
unusually large.
Interquartile Range
• One method for identifying these outliers,
involves the use of quartiles.
• The interquartile range (IQR) is Q3 – Q1.
• All numbers less than Q1 – 1.5(IQR) are
probably too small.
• All numbers greater than Q3 + 1.5(IQR) are
probably too large.
Using IQR to Find Outliers

The red lines are 1.5 times the IQR. Starting from Q1 going
left, and starting from Q3 going right 1.5(IQR) we establish
limits. All numbers smaller on the left, and larger on the right
are outliers.
Example
Linear Transformations
• When changing units, e.g., feet to meters,
degrees F to degrees C, we employ a linear
transformation.
– New = a + b Old
• Measures of both center and spread will be
multiplied by “b”.
• Only measures of location are affected by
“a”.

You might also like