0% found this document useful (0 votes)
2 views

Measures of the Spread of the Data (Ch2Sec7)

Uploaded by

jsabir2004js
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Measures of the Spread of the Data (Ch2Sec7)

Uploaded by

jsabir2004js
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

SECTION 2.

Measures of the Spread of the


Data
THE STANDARD DEVIATION

• An important characteristic of any set of data is the variation in the


data.
• In some data sets, the data values are concentrated closely near
the mean; in other data sets, the data values are more widely
spread out from the mean.
• The most common measure of variation, or spread, is the standard
deviation.
• The standard deviation is a number that measures how far data
values are from their mean.
• The standard deviation
• Provides a numerical measure of the overall amount of
variation in a data set
• Can be used to determine whether a particular data value is
close to or far from the mean
PROVIDING A MEASURE OF OVERALL VARIATION

• The standard deviation is always positive or zero.


• The standard deviation is small when the data are all concentrated close
to the mean, exhibiting little variation or spread.
• The standard deviation is larger when the data values are more spread
out from the mean, exhibiting more variation.
• EX: Suppose we are studying the amount of time customers wait in line
at the checkout at supermarket A and supermarket B. The average wait
time at both supermarkets is five minutes.
• At supermarket A, the standard deviation for wait time is 2
minutes; at supermarket B, the standard deviation for the wait
time is 4 minutes.
• Because supermarket B has a higher standard deviation, we know
that there is more variation in the wait times at supermarket B.
• Overall, wait times at supermarket B are more spread out from the
average, wait times at supermarket A are more concentrated near
the average.
DETERMINE WHETHER A VALUE IS CLOSE OR FAR FROM THE MEAN

• Suppose that Rosa and Binh both shop at supermarket A.


• Rosa waits at the checkout counter for 7 minutes, and Binh waits
for 1 minute.
• At supermarket A, the mean waiting time is 5 minutes, and the
standard deviation is 2 minutes.
• The standard deviation can be used to determine whether a data
value is close to or far from the mean.
• Rosa waits for 7 minutes:
• Seven is 2 minutes longer than the average of 5; 2 minutes
is equal to one standard deviation.
• Rosa’s wait time of 7 minutes is two minutes longer than
the average of 5 minutes.
• Rosa’s wait time of 7 minutes is one standard deviation
above the average of 5 minutes.
DETERMINE WHETHER A VALUE IS CLOSE OR FAR FROM THE MEAN

• Binh waits for 1 minute:


• One is 4 minutes less than the average of 5; 4 minutes is
equal to two standard deviations.
• Binh’s wait time of 1 minute is four minutes less than the
average of 5 minutes.
• Binh’s wait time of 1 minute is two standard deviations
below the average of 5 minutes.
• A data values that is two standard deviations from the average is just
on the borderline for what many statisticians would consider to be far
from the average.
• Considering data to be far from the mean if it is more than two
standard deviations away is more of an approximate “rule of thumb”
than a rigid rule.
• In general, the shape of the distribution of the data affects how much
of the data is further away than the two standard deviations (more on
this in later chapters).
FIGURE 2.24

• The number line may help you understand more of the standard deviation.
• If we were to put 5 and 7 on the number line, 7 is to the right of 5.
• We say that 7 is one standard deviation to the right of 5 because 5 + (1)(2)=7.
• If 1 were also part of the data set, then 1 is two standard deviations to the left of five
because 5 + (-2)(2) = 1.
MORE ON THE STANDARD DEVIATION

• In general, a value = mean + (#ofSTDEV)(standard deviation),


where
• #ofSTDEV = the number of standard deviations
• #ofSTDEV does not need to be an integer
• 1 is two standard deviations less than the mean of 5 because: 1 = 5
+ (-2)(2).
• The equation value = mean + (#ofSTDEV)(standard deviation) can
be expressed for a sample and for a population.
• Sample:
• Population:
• Where
• s is the sample standard deviation
• σ is the population standard deviation
• is the sample mean
• μ is the population mean
CALCULATING THE STANDARD DEVIATION

• If x is a number, then the difference “x – mean” is called its deviation.


• In a data set, there are as many deviations as there are items in a data
set.
• The deviations are used to calculate the standard deviation.
• If the numbers belong to a population, in symbols, a deviation is .
• For sample data, in symbols, a deviation is .
• The procedure to calculate the standard deviation depends on whether
the numbers are the entire population or are from a sample. (The
calculations are similar, but not identical.)
• The symbol used to represent the standard deviation depends on
whether it is calculated from a population or a sample.
• The lower case letter s represents the sample standard deviation, and
the Greek letter σ (sigma) represents the population standard deviation.
• If the sample has the same characteristics as the population, then s
should be a good estimate of σ.
MORE ON CALCULATIONS

• To calculate the standard deviation, we need to calculate the


variance first.
• The variance is the average of the squares of the deviations.
• The symbol represents the population variance; the population
standard deviation is the square root of the population variance.
• The symbol represents the sample variance; the sample standard
deviation is the square root of the sample variance.
• If the numbers come from a census of the entire population and
not a sample, when we calculate the average of the squared
deviations to find the variance, we divide by N, the number of
items in the population.
• If the data are from a sample, rather than a population, when we
calculate the average of the squared deviations, we divide by n –
1, one less than the number of items in the sample.
FORMULAS FOR THE STANDARD DEVIATION

• To calculate the sample standard deviation:


• or
• For the sample standard deviation, the denominator is n – 1,
the sample size minus 1.
• To calculate the population standard deviation:
• or
• For the population standard deviation, the denominator is N,
the number of items in the population.

• In these formulas, f represents the frequency with which a value


appears. For example, if a value appears once, f is 1. If a value
appears three times in the data set or population, f is 3.
EXAMPLE 2.32

• In a fifth grade class, the teacher was interested in the average


age and the sample standard deviation of the ages of her
students.
• The following data are the ages for a sample of n = 20 fifth grade
students.
• The ages are rounded to the nearest half year:
• 9; 9.5; 9.5; 10; 10; 10; 10; 10.5; 10.5; 10.5; 10.5; 11; 11; 11;
11; 11; 11; 11.5; 11.5; 11.5
• The sample mean is:
• = 10.525
• The average age is 10.53 years, rounded to two places.
EXAMPLE 2.32 CONTINUED

• The variance may be calculated using a table.


• Then the standard deviation is calculated by taking the square root
of the variance.

Data Freq Deviations Deviations2 (Freq)(Deviations2)


x f (x – xbar) (x - xbar)2 (f) (x - xbar)2

9 1 9-10.525 = -1.525 (-1.525)2 = 2.325625 1 (2.325625) = 2.325625


9.5 2 9.5-10.525 = -1.025 (-1.025)2 = 1.050625 2 (1.050625) = 2.101250
10 4 10-10.525 = -0.525 (-0.525)2 = 0.275625 4 (0.275625) = 1.1025
10.5 4 10.5 – 10.525 = - (-0.025)2 = 0.000625 4 (0.000625) = 0.0025
0.025
11 6 11-10.525 = 0.475 (0.475)2 = 0.225625 6 (0.225625) = 1.35375
11.5 3 11.5 – 10.525 = 0.975 (0.975)2 = 0.950625 3 (0.950625) = 2.851875
Total is 9.7375
EXAMPLE 2.32 CONTINUED

• The sample variance, s2, is equal to the sum of the last column
(9.7375) divided by the total number of data values minus one (20-
1):
• s2 = 9.7375/(20-1) = 0.5125
• The sample standard deviation s is equal to the square root of
the sample variance:
• s = (0.5125)1/2 = 0.715891
• s is then rounded to two decimal places, so s=0.72.
• Typically, you do the calculation for the standard deviation on
your calculator or computer. (This way the intermediate results
are not rounded, so the answer is more accurate.)
EXAMPLE 2.32 – EXTRA PRACTICE

• For the previous problem, using a calculator or computer:


• Verify the mean
• Verify the standard deviation
• Find the value that is one standard deviation above the
mean.
• Find the value that is two standard deviations below the
mean.
• Find the values that are 1.5 standard deviations from
(below and above) the mean.

• Check your answers from your calculator or computer with


the solutions on p.112 in the book.
MORE ON THE STANDARD DEVIATION

• The deviations show how spread out the data are about the mean.
(The data value 11.5 is farther away from the mean than is the
data value 11, which is indicated by the deviations 0.97 and 0.47,
respectively.)
• A positive deviation occurs when the data value is greater than the
mean, whereas a negative deviation occurs when the data value is
less than the mean. (The deviation is -1.525 for the data value
nine.)
• If you add the deviations, the sum is always zero. (Hence why you
cannot simply add the deviations to get the spread of the data.)
• By squaring the deviations, you make them positive numbers, and
the sum will also be positive.
• The variance, then, is the average squared deviation.
MORE ON THE STANDARD DEVIATION

• The variance is a squared measure and does not have the same
units as the data.
• However, the standard deviation does have the same units as the
data due to taking the square root.
• For the sample variance, we divide by the sample size minus one
instead of the sample size. Why?
• The sample variance is an estimate of the population
variance, and based on the theoretical mathematics that lies
behind these calculations, dividing by the sample size minus
one gives a better estimate of the population variance.
MORE ON THE STANDARD DEVIATION

• The standard deviation for the sample or population is either zero or larger than
zero.
• When the standard deviation is 0, there is no spread (thus, all the data values
are equal to each other).
• The standard deviation is small when the data are all concentrated close to the
mean, and is larger when the data values show more variation from the mean.
• When the standard deviation is a lot larger than zero, the data values are very
spread out about the mean; outliers can make the standard deviation very
large.
• The standard deviation is very helpful in describing the spread of
symmetrical distributions; however, in skewed distributions, the standard
deviation is usually not of much help, because the two sides of a skewed
distribution have different spreads (unlike symmetrical distributions).
• For a skewed distribution, use the IQR to look at the spread of the data.
• Because numbers can sometimes be confusing, always graph your data (in a
histogram or box plot).
COMPARING VALUES FROM DIFFERENT DATA SETS

• The standard deviation is useful when comparing data values that


come from different data sets.
• If the data sets have different means and standard deviations, then
comparing the data values directly can be misleading.
• To directly compare scores from different data sets, we use a z-
score which has the following formula:

Sample x = xbar + (z)(s) z = (x-xbar)/s


Population X = mu + (z) Z=(x-mu)/sigma
(sigma)
EXAMPLE 2.35

• Two students, John and Ali, from different high schools, wanted to
find out who had the highest GPA when compared to his school.
• Which student had the highest GPA when compared to his school?

Studen GPA School Mean GPA School Standard


t Deviation
John 2.85 3.0 0.7
Ali 77 80 10
EXAMPLE 2.35

• For each student, determine how many standard deviations his GPA is
away from the average, for his school.
• Pay careful attention to signs when comparing and interpreting the
answer.
• z=# of STDEVs=(value-mean)/standard deviation
• For John:
• z = # of STDEVs = (2.85-3.0)/0.7 = -0.21
• For Ali:
• z = # of STDEVs = (77-80)/10 = -0.3
• John has the better GPA when compared to his school because his
GPA is 0.21 standard deviations below his school’s mean, while Ali’s
GPA is 0.3 standard deviations below his school’s mean.
• John’s z-score of -0.21 is higher than Ali’s z-score of -0.3.
• For GPA, higher values are better, so we conclude that John has the
better GPA when compared to his school.
STANDARD DEVIATION AND THE DISTRIBUTION OF DATA

• For ANY data set, no matter what the distribution of the data is:
• At least 75% of the data is within two standard deviations of
the mean.
• At least 89% of the data is within three standard deviations
of the mean.
• At least 95% of the data is within 4.5 standard deviations of
the mean.
• This is known as Chebyshev’s rule.
STANDARD DEVIATION AND THE DISTRIBUTION OF DATA

• For data having a distribution that is BELL-SHAPED and


SYMMETRIC:
• Approximately 68% of the data is within one standard
deviation of the mean.
• Approximately 95% of the data is within two standard
deviations of the mean.
• More than 99% of the data is within three standard
deviations of the mean.

• This is known as the Empirical Rule.


EMPIRICAL RULE PICTURE (WIKIPEDIA.ORG)
This OpenStax ancillary resource is © Rice University under a CC-BY 4.0
International license; it may be reproduced or modified but must be
attributed to OpenStax, Rice University and any changes must be noted.

You might also like