Measures of the Spread of the Data (Ch2Sec7)
Measures of the Spread of the Data (Ch2Sec7)
• The number line may help you understand more of the standard deviation.
• If we were to put 5 and 7 on the number line, 7 is to the right of 5.
• We say that 7 is one standard deviation to the right of 5 because 5 + (1)(2)=7.
• If 1 were also part of the data set, then 1 is two standard deviations to the left of five
because 5 + (-2)(2) = 1.
MORE ON THE STANDARD DEVIATION
• The sample variance, s2, is equal to the sum of the last column
(9.7375) divided by the total number of data values minus one (20-
1):
• s2 = 9.7375/(20-1) = 0.5125
• The sample standard deviation s is equal to the square root of
the sample variance:
• s = (0.5125)1/2 = 0.715891
• s is then rounded to two decimal places, so s=0.72.
• Typically, you do the calculation for the standard deviation on
your calculator or computer. (This way the intermediate results
are not rounded, so the answer is more accurate.)
EXAMPLE 2.32 – EXTRA PRACTICE
• The deviations show how spread out the data are about the mean.
(The data value 11.5 is farther away from the mean than is the
data value 11, which is indicated by the deviations 0.97 and 0.47,
respectively.)
• A positive deviation occurs when the data value is greater than the
mean, whereas a negative deviation occurs when the data value is
less than the mean. (The deviation is -1.525 for the data value
nine.)
• If you add the deviations, the sum is always zero. (Hence why you
cannot simply add the deviations to get the spread of the data.)
• By squaring the deviations, you make them positive numbers, and
the sum will also be positive.
• The variance, then, is the average squared deviation.
MORE ON THE STANDARD DEVIATION
• The variance is a squared measure and does not have the same
units as the data.
• However, the standard deviation does have the same units as the
data due to taking the square root.
• For the sample variance, we divide by the sample size minus one
instead of the sample size. Why?
• The sample variance is an estimate of the population
variance, and based on the theoretical mathematics that lies
behind these calculations, dividing by the sample size minus
one gives a better estimate of the population variance.
MORE ON THE STANDARD DEVIATION
• The standard deviation for the sample or population is either zero or larger than
zero.
• When the standard deviation is 0, there is no spread (thus, all the data values
are equal to each other).
• The standard deviation is small when the data are all concentrated close to the
mean, and is larger when the data values show more variation from the mean.
• When the standard deviation is a lot larger than zero, the data values are very
spread out about the mean; outliers can make the standard deviation very
large.
• The standard deviation is very helpful in describing the spread of
symmetrical distributions; however, in skewed distributions, the standard
deviation is usually not of much help, because the two sides of a skewed
distribution have different spreads (unlike symmetrical distributions).
• For a skewed distribution, use the IQR to look at the spread of the data.
• Because numbers can sometimes be confusing, always graph your data (in a
histogram or box plot).
COMPARING VALUES FROM DIFFERENT DATA SETS
• Two students, John and Ali, from different high schools, wanted to
find out who had the highest GPA when compared to his school.
• Which student had the highest GPA when compared to his school?
• For each student, determine how many standard deviations his GPA is
away from the average, for his school.
• Pay careful attention to signs when comparing and interpreting the
answer.
• z=# of STDEVs=(value-mean)/standard deviation
• For John:
• z = # of STDEVs = (2.85-3.0)/0.7 = -0.21
• For Ali:
• z = # of STDEVs = (77-80)/10 = -0.3
• John has the better GPA when compared to his school because his
GPA is 0.21 standard deviations below his school’s mean, while Ali’s
GPA is 0.3 standard deviations below his school’s mean.
• John’s z-score of -0.21 is higher than Ali’s z-score of -0.3.
• For GPA, higher values are better, so we conclude that John has the
better GPA when compared to his school.
STANDARD DEVIATION AND THE DISTRIBUTION OF DATA
• For ANY data set, no matter what the distribution of the data is:
• At least 75% of the data is within two standard deviations of
the mean.
• At least 89% of the data is within three standard deviations
of the mean.
• At least 95% of the data is within 4.5 standard deviations of
the mean.
• This is known as Chebyshev’s rule.
STANDARD DEVIATION AND THE DISTRIBUTION OF DATA