CH 10 Analysing Data
CH 10 Analysing Data
STATISTICAL ANALYSIS
ANALYSING DATA
The meteorologists at the Bureau of Meteorology measure and record weather data from over 1000
sites in Australia and Antarctica, and calculate statistics about temperature, rainfall and humidity.
Climate averages, such as the median monthly rainfall or the mean number of rainy days per month,
are calculated from weather data gathered over many years and can assist farmers to decide on the
best times to plant crops.
CHAPTER OUTLINE
S1.2 10.01 The mean, median and mode
S1.2 10.02 Quartiles, deciles and percentiles
S1.2 10.03 The range and interquartile range
S1.2 10.04 The effect of outliers
S1.1 10.05 Cumulative frequency graphs
S1.2 10.06 Box plots
S1.2 10.07 Standard deviation
S1.2 10.08 The shape of a distribution
IN THIS CHAPTER YOU WILL:
• calculate and interpret the mean, median and mode of sets of data, including ungrouped data
• calculate and interpret the quartiles, deciles and percentiles of a set of data
• calculate and interpret the range, interquartile range and standard deviation of sets of data,
including ungrouped data
• identify outliers in a set of data and examine their effects on statistical measures
• calculate cumulative frequency and construct cumulative frequency histograms and polygons
• use a cumulative frequency polygon to find the median, quartiles and interquartile range of a
data set
• use a five-number summary to construct box plots
• describe the shape of a distribution using its graph or display
iStock.com/behindlens
TERMINOLOGY
box plot class centre class interval
cumulative frequency cumulative frequency histogram cumulative frequency polygon
decile distribution extremes
five-number summary interquartile range mean
measure of central tendency measure of spread median
median class modal class mode
ogive outlier peak
percentile population quartile
range sample standard deviation
summary statistics symmetrical
SkillCheck
1 This stem-and-leaf plot shows the ages of Stem Leaf
WS
visitors entering the Royal Easter Show in a 0 3 8 9
Assignment
Homework five-minute period. 1 0 2 2 2 5 6 7 9
10
a How many visitors entered the show during the 2 0 2 3 4 6 7
five-minute period? 3 1 3 3 4 9
b What was the age of the oldest visitor? 4 3 4 7 8
c What was the most common age? 5 5 5 8
d How many visitors were under 16 years old?
e What was the middle age?
Copy and complete the frequency table below using the data about the mass of the
skydivers.
50 – < 60
60 – < 70
70 – < 80
80 – < 90
90 – < 100
100 – < 110
The mean
sum of scores
Mean, x =
number of scores
∑x Σ means ‘the sum of’.
= x represents a score.
n n is the total number of scores.
If the scores in a data set are presented in a frequency distribution table then, by adding an
‘fx’ column, the mean can be calculated using the formula shown below.
∑ fx
=
∑f
b A stem-and-leaf plot of the marks (out of 100) in a maths test for a class of students:
Stem Leaf
4 4 7 7 8
5 2 6 8 9 9
6 1 3 5 5 7 8
7 0 2 3 4 5
8 3 7 8
9 2 8
Solution
21 25 26 28 28 30 31 32 32 32 33 34 35 35
For 14 scores, the middle
31+ 32
Median = scores are the 7th and 8th
2 scores.
= 31.5
≈ 66.8
ii Stem Leaf
4 4 7 7 8
5 2 6 8 9 9
6 1 3 5 5 7 8
7 0 2 3 4 5 For 25 scores, the middle
score is the 13th score
8 3 7 8
9 2 8
Median = 65
iii Mode = 47, 59 and 65
STAT EDIT
Clear the statistical memory. With cursor in List 1 column With cursor on L1
EXIT F6 DEL-A Yes CLEAR ENTER
Enter data. 30 EXE 28 EXE , etc. to enter 30 ENTER 28 ENTER , etc. to enter
in List 1 column in List 1 column
Calculate the mean. F6 CALC SET to make STAT CALC 1-Var Stats ENTER
( x = 30.142 85…) these settings (if different): to calculate many statistics
Calculate the sum of scores. 1Var XList: List 1 (scroll down for more)
(Σx = 422) 1Var Freq: 1
Check the number of scores. EXE
(n = 14) 1VAR to calculate many
EXAMPLE 2
Statistics
from a
The scores for the players in a nine-hole golf Score (x) Frequency ( f )
frequency
table
competition were sorted into the frequency table.
37 2
a How many players were there?
38 4
b For this data, find: 39 7
i the mean (correct to one decimal place) 40 4
ii the mode 41 1
iii the median.
Solution
a 18 players Sum of f = 18
Σfx 700
Mean, x = = The sum of all 18 scores = 700
Σf 18
= 38.888 8…
≈ 38.9
ii Mode = 39 39 has the highest frequency, 7
iii To the table, add a cumulative frequency column which keeps a running total of
the frequencies.
Score (x) Frequency ( f ) Cumulative frequency
37 2 2
38 4 6 2+4=6
39 7 13 6 + 7 = 13, etc.
40 4 17
41 1 18
Because there are 18 scores, the two middle scores are the 9th and 10th scores.
Reading from the cumulative frequency column, the 6th score is 38, and the 13th
score is a 39, so the 9th and 10th scores must both be 39.
39 + 39 Note that the mean (38.9), median
Median = = 39 (39) and mode (39) are all at the
2
centre of the data set.
Alternatively for part i, follow the instructions on the next page to use a calculator’s
statistics mode to calculate the mean of the golf scores.
37 = 38 = , etc. to enter in 2 M+
2 = 4 = , etc. to enter in 4 M+
FREQ column
etc.
AC to leave table
STAT EDIT
( x = 38.888 8…) these settings (if different): andtype ‘L1, L2’ by pressing
Calculate the sum of 1Var XList: List 1 2nd 1 ’ 2nd 2
scores.
1Var Freq: 1 ENTER to calculate many
(Σx = 700) statistics (scroll down for more)
EXE
Check the number of
1VAR to calculate many
scores.
statistics (scroll down for
(n = 18) more)
EXAMPLE 3
The ages of the patients at a medical centre in one afternoon Age Frequency
were recorded and grouped into this frequency table. 0–9 8
a Calculate, correct to one decimal place, the estimated 10–19 7
mean age of the patients. 20–29 6
b How many patients went to the medical centre? 30–39 8
40–49 5
Solution 50–59 4
60–69 3
a Age Class centre, x Frequency, f fx 70–79 1
0–9 4.5 8 36
10–19 14.5 7 101.5
20–29 24.5 6 147
30–39 34.5 8 276
40–49 44.5 5 222.5
50–59 54.5 4 218
60–69 64.5 3 193.5
70–79 74.5 1 74.5
Totals ∑ f = 42 ∑ f = 1269
Σfx
Estimate of the mean, x =
Σf
1269
=
42
= 30.214 2 …
Note that the estimated mean age of
≈ 30.2 30.2 is a central value of the data set.
b
42 patients Σf = 42
EXAMPLE 4
The monthly call costs of a sample of mobile Call cost ($) Frequency Cumulative
phone users were grouped as shown in the frequency
cumulative frequency table on the right. 0– < 20 6 6
For this data, find: 20– < 40 8 14
a the median class 40– < 60 13 27
Solution
a There are 120 scores. The two middle scores are the 60th and 61st scores.
From the cumulative frequency column, the 60th and 61st scores are in the
80–< 100 class.
The median class is 80 – < 100.
b The modal class is 80 – < 100. This class has the highest frequency, 23.
EXAMPLE 5
Which measure of central tendency is most appropriate for describing each of the
following averages?
a the average price of a new car
Solution
a Median, because there would be many outliers (the prices of expensive cars).
d Median, because there would be many outliers (the incomes of very rich people).
Ten houses were sold this week at Nelson Lakes for the following prices.
$376 000 $1 200 000 $270 000 $308 000 $372 000
$409 000 $387 000 $582 000 $460 000 $238 000
Solution
a 4 602 000
Mean, x =
10
= $460 200
b Prices in order:
$238 000 $270 000 $308 000 $372 000 $376 000
$387 000 $409 000 $460 000 $582 000 $1 200 000
1
b 37 31 35 39 31 32 34 32 35 38
c 28 40 38 42 45 29 31 41 30
d
5 8 14 9 10 7 11 15 8 7 5
3 Ngaire is training for a triathlon. She swam the following times, in minutes,
in her last 10 races.
28 34 22 24 25 24 26 26 24 27
Frequency
a Draw a frequency table for thus data, including
3
an ‘fx’ column.
2
b Over how many days was the number of calls
1
Elena made recorded?
0
c Find the mode of this data. 2 3 4 5 6 7
d Find the median of this data. Number of calls per day
7 The police used radar to check the speeds of Speed (km/h) Number of cars, f Example
8 The heights of young trees in a section of nursery Height (cm) Number of trees Example
were measured before planting. The results are 4
20 – 29 28
shown in the table on the right.
30 – 39 45
For this data, find:
40 – 49 74
a the median class
50 – 59 63
b the modal class.
60 – 69 24
10 The weekly wages of the staff at Yen’s restaurant Wage ($) Number of
are shown in the frequency table. employees
a What is the modal class for the wages? 100– < 200 5
b What is the median class? 200– < 300 11
300– < 400 20
400– < 500 4
500– < 600 3
600– < 700 1
11 Decide which M (mean, median or mode) is correct for each of the following.
a This M takes all scores in the data set into account.
b This M is one of the scores if there is an odd number of scores.
c Half of the scores are above this M, the other half are below.
d There can be more than one M in a set of data.
e This M often needs to be rounded to decimal places.
f This M can also be used for categorical data.
g This M can be distorted by many outliers.
h This M must be one of the scores in the data set.
Example
12 Which measure of central tendency is most appropriate for describing each average?
5 a the average exam mark for the class
b the average shirt size for teenage girls
c the average rent paid for a house in Sydney
d the average screen size of a notebook computer
e the average mass of football players in a team
f the average brand of mobile phone
15 The colours of the new cars sold last week at Huxley Motors were recorded. The results
are shown in the table below.
Colour Black Blue Red Silver White
Frequency 4 7 7 9 12
TECHNOLOGY
Calculating measures of central tendency
Step 1: Open a blank spreadsheet to enter the following temperature data about
Mudgee from Example 1 on page 403.
Step 2: In cell E5, enter the formula =AVERAGE(A2:G3) to calculate the mean
(30.142 85…).
Step 3: In cell E6, enter the formula =MEDIAN(A2:G3) to calculate the median
(31.5).
Step 4: In cell E7, enter the formula =MODE(A2:G3) to calculate the mode (32).
Quartiles
The three quartiles of a data set are those values that separate the data into quarters.
• The lower quartile, Q1 or QL, separates the bottom quarter (25%) of scores from
the rest of the scores.
• The upper quartile, Q3 or QU, separates the top quarter (25%) of scores from the
rest of the scores.
• The middle quartile, Q2, is the median, and separates the two middle quarters.
These speeds (in km/h) were recorded for 11 cars driving along a major country road:
Q1 = 84 Q2 = 93 Q3 = 104
b The scores obtained by a golfer for the first nine holes of a golf course are:
4 3 5 6 4 3 8 6 6
Solution
46 + 51
Q2 = — = 48.5
2
Q1 = 41 Q3 = 60
b 3 3 4 4 5 6 6 6 8
3+4 6+6
Q1 = — = 3.5 Q3 = — = 6
2 2
Q2 = 5
Deciles
Quartiles (Q1, Q2 and Q3) separate data into quarters.
Deciles (D1, D2, D3, D4, D5, D6, D7, D8 and D9) separate data into tenths. Deci- means
‘one tenth’.
For example:
• D1 cuts off the lowest 10% of scores.
• D4 cuts off the lowest 40% of scores.
• D9 cuts off the lowest 90% of scores (or the top 10% of scores).
Solution
Place the values in order first:
D1 D2 D3 D4 D5
44 45 47 48 48 48 49 49 49 50
50 50 51 51 52 52 52 53 55 56
D6 D7 D8 D9
48 + 49
a D3 = = 48.5
2
50 + 50
b D5 = = 50
2
c The median, because it cuts off the lowest 50% of scores.
51 + 52
d D7 = = 51.5
2
53 + 55
e James’ length must be greater than D9 = = 54.
2
Percentiles
Percentiles (P1, P2, P3, ... P99) separate data into hundredths.
For example:
• P24 cuts off the lowest 24% of scores
• P60 cuts off the lowest 60% of scores
• P87 cuts off the lowest 87% of scores (or the top 13% of scores).
Deciles and percentiles are only meaningful when analysing large sets of data.
The following information is based on population data for the heights of girls aged
16 years.
• The median is 163 cm.
• The 3rd quartile Q3 = 167 cm.
• The 9th decile D9 = 171 cm.
• The 5th percentile P5 = 152 cm.
• The 97th percentile P97 = 175 cm.
In the following questions, all of the girls mentioned are aged 16.
a Holly’s height is 175 cm. Is she tall for her age and what percentage of 16-year-old
girls are taller than her?
b Olga is taller than 90% of girls her age. What is her height?
1
c If of girls her age are taller than Verity, how tall is she?
4
d What height separates the bottom 5% of heights from the top 95%?
e What percentile is a height of 163 cm?
Solution
a Yes, P97 = 175 cm, which means Holly is taller than 97% of girls her age. So only
3% of girls aged 16 are taller than her.
b Olga’s height = P90 = D9 = 171 cm.
3
c Verity is taller than of girls her age, so her height is P75 = Q3 = 167 cm.
4
d P5 = 152 cm
e 163 cm is the median, so it is also the 50th percentile P50 (the height that cuts off
the lowest 50% of scores). The median is the 2nd quartile Q , the 5th
2
decile D5 and the 50th percentile P50.
Example
1 Find the quartiles Q1, Q2 and Q3 for each data set below.
7 a The times, in seconds, to run 100 metres:
8.7 9.1 11.0 13.5 10.6 8.9 10.1 9.6
12.3 9.9 9.0 10.8 9.2 13.1 10.6
2 The stem-and-leaf plot on the right shows the game Stem Leaf
scores of a group of ten-pin bowlers. 8 2 7 8
For this data, find: 9 0 3 4 6 9
a the median 10 4 4 5 8 8 8
C 8.5 D 8
4 The percentage scores of a class of 30 students in a science test are shown below. Example
61 75 46 78 81 95 67 61 50 74 8
100 57 83 64 69 95 85 89 66 45
71 87 84 80 63 92 64 75 97 60
a What is the 8th decile?
b What is the 3rd decile?
c What is the 40th percentile?
d Find the value that cuts off the lower 20% of scores from the upper 80%?
e What percentage of students scored higher than 79?
5 For the data shown in the dot plot in Question 3, find:
a the 1st decile
b the 5th decile
c the value that cuts off the lower 70% of scores
d the value that cuts off the top 60% of scores
e the 90th percentile.
7 The information below is based on weather records kept by the Bureau of Meteorology
for the maximum daily temperatures in November for Newcastle.
• The mean is 23.5°C.
• The highest temperature on record was 41.0°C (on 19 November 1968).
• The lowest temperature on record was 15.6°C (on 19 November 1986).
• The 1st decile D1 = 18.9°C.
• The 9th decile D9 = 28.6°C.
© Copyright Commonwealth of Australia 2017, Bureau of Meteorology
8 True or false?
a P75 = Q1 b P60 = D6 c P50 = Median
d Q3 = P75 e D8 = P20 f Q2 = D5
Percentile 40 50 60 70 80 85 90 95 99 100
ATAR 61.65 68.65 75.25 81.60 87.85 90.90 93.95 96.95 99.40 99.95
© 2016 Universities Admissions Centre (NSW & ACT)
10 This table shows the percentiles for the heights (in cm) of girls aged 2 to 5 years,
according to the child growth standards of the World Health Organization (WHO).
155
150
145
140
Height (cm)
135
130
125
120
115
110
105
100
95
90
85
80
75
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Age (years)
Source: Developed by the National Center for Health Statistics in collaboration with the National Center for Chronic Disease Prevention and
Health Promotion (2000) http://www.cdc.gov/growthcharts
a Adam is aged 9 and 129 cm tall. What percentage of boys his age are shorter than him?
b Justin is 11 years old and 155 cm tall. What percentage of boys his age are shorter
than him?
c How tall should Justin be when he turns 18?
d Liong is 103 cm tall, which is at the 1st decile for boys his age. How old is Liong?
e Asam is 16 and his height is at the 3rd quartile.
i What is Asam’s height now?
ii What will Asam’s height be when he turns 20 years old?
TECHNOLOGY
Calculating quartiles and percentiles
A spreadsheet can be used to calculate the quartiles, deciles and percentiles
of a set of data.
Step 1: Open a blank spreadsheet to enter data in rows 2 and 3 as shown using the
infant lengths from Example 8 on page 419.
Step 2: In cell F5, enter the formula = QUARTILE(B2:K3,1) to calculate Q1 = 48.
Step 3: In cell F6, enter = QUARTILE(B2:K3,3) to calculate Q3 = 52.
Step 4: In cell F7, enter = PERCENTILE(B2:K3,0.2) to calculate D2 = 48.
Step 5: In cell F8, enter = PERCENTILE(B2:K3,0.7) to calculate D7 = 51.3.
Step 6: In cell F9, enter = PERCENTILE(B2:K3,0.32) to calculate P32 = 49.
Step 7: In cell F10, enter = PERCENTILE(B2:K3,0.95) to calculate P95 = 55.05.
PS
EXAMPLE 10
Statistical
match-up
For each data set below, find:
i the range ii the interquartile range.
a The maximum daily temperature (in °C) in Mudgee for the first two weeks in
January:
30 28 26 31 34 35 32 33 21 25 28 32 32 35
b The body temperatures (in °C) of a sample of hospital patients, as shown in the dot
plot on the right.
36 37 38 39 40 41 42 °C
Patients’ temperatures
Solution
a i Range = 35 – 21 = 14
ii Placing the scores in order:
21 25 26 28 28 30 31 32 32 32 33 34 35 35
Q1 = 28 Q2 Q3 = 33
IQR = Q3 – Q1
= 33 – 28
=5
36 37 38 39 40 41 42 °C
Patients’ temperatures
37 + 37 38 + 39
ii Q1 = = 37, Q3 = = 38.5
2 2
IQR = Q3 − Q1
= 38.5 – 37
= 1.5
The range represents the total spread of scores but it is not a good measure if there are
outliers. The interquartile range is not affected by outliers, because it measures the range of
the middle two quarters only.
Range
Interquartile
range
25% 50% 25%
A 2.5 B 3 C 5 D 8
5 Fifteen job applicants took a short general knowledge multiple-choice quiz. Their times
(in seconds) to complete this test were:
45 37 46 34 26 15 35 61
43 48 52 38 30 44 37
6 This stem-and-leaf plot below represents the number of points Stem Leaf
per match scored by the GWS Giants in a football season. 7 8 9
8 3 5 8 9
9 1 2 3 5 8
10 0 5
11 1 7
12 6 7 9
13
Getty Images/Matt King
14 6 9
15 1 8
Which of the following is the interquartile range of this data set? Select A, B, C or D.
A 80 B 38 C 44 D 42
Outliers
An outlier is a score that is either: This is only one of many ways of
determining whether a score is an outlier.
• less than Q1 − 1.5 × IQR or
• greater than Q3 + 1.5 × IQR
where Q1 (or QL) is the lower quartile, Q3 (or QU) is the upper quartile, and IQR is the
interquartile range.
EXAMPLE 11
11 8 12 12 15 13 10 25 12 11 7
10 13 16 10 12 16 11 12 16 17 20
Test which scores are outliers.
Solution
7 8 10 10 10 11 11 11 12 12 12 12 12 13 13 15 16 16 16 17 20 25
Q1 Q2 Q3
IQR = Q3 − Q1
= 16 − 11
=5
∴ 1.5 × IQR = 1.5 × 5 Q3 + 1.5 × IQR = 16 + 7.5
= 7.5 = 23.5
EXAMPLE 12
Solution
535
a Mean = ≈ 38.2
14
Mode = 39
38 + 38
Median = = 38 The average of the 7th and 8th scores
2
b From the dot plot, Q1 = 37, Q3 = 39
IQR = Q3 − Q1 Q1 Q3
–
= 39 − 37
=2 36 37 38 39 40 41 42 43 °C
Temperature
1.5 × IQR = 1.5 × 2
=3
If 43 is an outlier, it must be greater than Q3 + 1.5 × IQR.
∴ Q3 + 1.5 × IQR = 39 + 3
= 42
∴ the outlier is 43.
3 2 0 0 1 2 3 2 4 8 11
2 3 5 2 1 3 4 4 2 3
a Find the interquartile range.
b Find the value of:
i Q1 − 1.5 × IQR ii Q3 + 1.5 × IQR
c Is the score of 8 goals an outlier? Give reasons.
3 The employees at the Bread and Butter Cafe earned the following wages in a week. Example
$450 $520 $610 $230 $900 $420 $590 12
a What is the mean wage?
b What is the median wage?
c Find the interquartile range.
d The manager’s wage is an outlier. What is this wage and how do we verify that it is
an outlier?
e If the manager’s wage is not included, how does this affect the mean and median
wage?
f If each employee receives a 10% pay rise, what will be the new mean and median
wage? Is it 10% more than the old mean and median?
5 A group of friends goes to the cinema. The ages of the group are
13 12 11 14 12 15 14 13.
If Kait brings her 5-year-old sister as well, what will happen? Select A, B, C or D.
A The median age increases. B The median age decreases.
C The mean age increases. D The mean age decreases.
6 In a netball tournament of five matches, the points scored by three teams are:
The Wombats 24 18 14 6 22
The Possums 16 16 15 18 15
The Koalas 36 8 14 16 12
a What are the mean and median scores for each team?
b Which team is the most consistent? Why?
c An error was made in the scoring for the Wombats – the score of 6 should have
been 16. What are the new mean and median?
d Which team is most consistent now? Why?
7 Sam and Terri sell copiers. The numbers of copiers that they sell each week are sorted in
ascending order.
Sam 1 2 3 3 5 6 7 8 12 25
Terri 3 3 3 14 16 18 18 24 32 35
0 1 2 3 4 5 6 7 8 9
Accidents/month
9 Rupert’s bookstore employs the following people with annual wages as shown.
1 store manager $73 800
2 cashiers $34 200 each
2 part-time clerical staff $28 500 each
3 salespeople $46 500 each
2 part-time cleaners $13 500 each
a Find the mean, median and modal annual salary for the 10 employees.
b Which measure of central tendency would Rupert use to make the salaries appear
higher? Why?
c Which measure best represents the average wage for an employee at Rupert’s
bookstore? Why?
However, his advice was ignored and the outlier was not considered important enough
to delay the flight. The Challenger exploded just after takeoff, killing all seven astronauts.
Later it was found that two rubber O-rings had failed to seal a joint at low temperatures,
causing the shuttle to disintegrate.
Give another example of when an outlier should not be ignored.
EXAMPLE 13
The maximum daily temperatures (in °C) in Campbelltown in June were recorded and
grouped into the frequency table.
Solution
30 ogive
21
18
median = 16
15
12
9 Q1 = 14
6
3
0
12 13 14 15 16 17 18 19 20 21
Temperature (°C)
EXAMPLE 14
Solution
D7 = 18
21
b D8 cuts off the top 20% of 18
temperatures, so the value is 18. 15
D5 = 16
D4 = 16
c Between D1 and D3. 12
D3 = 14.5
9
6
D1 = 13.5
3
0
12 13 14 15 16 17 18 19 20 21
Temperature (°C)
The number of cases of ovarian cancer in women from various age groups is shown below.
Draw an ogive for this data and use it to find an estimate for:
a the median b the 3rd quartile
c the 9th decile d the interquartile range.
Solution
Cases of ovarian cancer
320
D9 = 80
280
Q3 = 74
240
Cumulative frequency
200
Median = 66
160
120
Q1 = 53
80
40
0
35 45 55 65 75 85
Age (years)
2 This ogive shows the speeds of motor vehicles Speed of motor vehicles on main street
travelling along the main street of a town.
Cumulative frequency
Example
25
a How many vehicles were in the survey? 14
20
b Estimate the median speed of the vehicles
15
c Estimate the interquartile range. 10
d Estimate the 9th decile. 5
0
10 20 30 40 50 60 70 80
Speed (km/h)
Example
4 The heights of 50 students were measured and grouped into class intervals.
15 Height (cm) Class centre Frequency Cumulative frequency
134 – < 141 2
141 – < 148 3
148 – < 155 4
155 – < 162 13
162 – < 169 15
169 – < 176 11
176 – < 183 2
range, while the ‘whiskers’ represent the lowest and highest 25% of scores.
WS
A box plot gives a five-number summary of a data set:
interquartile whisker
• the lower extreme (lowest score) range Box plots:
Homework
box graphics
calculator
EXAMPLE 16
21 13 64 75 35 83 7 71 18 29
a Find a five-number summary for this data.
b Represent this data on a box plot.
Solution
a In order:
7 13 18 21 29 35 64 71 75 83
Q1 Q2 Q3
29 + 35
Lower extreme = 7 Lower quartile = 18 Median = = 32
2
Upper quartile = 71 Upper extreme = 83
The five-number summary for the ages is 7, 18, 32, 71, 83.
This box plot represents the amount of pocket money in dollars earned by a
sample of 48 children.
5 10 15 20 25 30 35 40 45
Pocket money ($)
a Find the median.
b Find the range.
c How many children earned between:
i $33 and $42? ii $15 and $42?
d Find the interquartile range.
Solution
a Median = $22
EXAMPLE 18
The mean maximum monthly temperatures for Sydney and Melbourne are
shown in this table.
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Sydney 25.9 25.8 24.8 22.5 19.5 17.0 16.4 17.9 20.1 22.2 23.7 25.2
Melbourne 26.0 25.8 23.9 20.3 16.7 14.1 13.5 15.0 17.3 19.7 22.0 24.2
© Copyright Commonwealth of Australia 2017, Bureau of Meteorology
Solution
a In order:
Sydney
16.4 17.0 17.9 19.5 20.1 22.2 22.5 23.7 24.8 25.2 25.8 25.9
Q1 Q2 Q3
17.9 + 19.5
Lower extreme = 16.4 Lower quartile =
2
= 18.7
Median = 22.2 + 22.5
2
= 22.35
24.8 + 25.2
Upper quartile = Upper extreme = 25.9
2
= 25.0
Melbourne
13.5 14.1 15.0 16.7 17.3 19.7 20.3 22.0 23.9 24.2 25.8 26.0
Q1 Q2 Q3
15.0 + 16.7
Lower extreme = 13.5 Lower quartile =
2
= 15.85
19.7 + 20.3
Median =
2
= 20.0
23.9 + 20.3
Upper quartile = Upper extreme = 26.0
2
= 24.05
Melbourne
13 14 15 16 17 18 19 20 21 22 23 24 25 26
Temperature (°C)
2 Fifteen job applicants took a short general knowledge multiple-choice quiz. Their times,
in seconds, to complete this test were as shown below. Show this data on a box plot.
45 37 46 34 26 15 35 61
43 48 52 38 30 44 37
5 This dot plot shows the number of vehicles driving past Westvale High School
per minute in a 20-minute period.
2 3 4 5 6 7 8 9 10
Number of vehicles per minute
a Find the five-number summary for this data and draw a box plot.
b Compare the box plot you drew in part a with the original dot plot. Which one do
you prefer? Why?
6 This box plot represents the annual wages (× $1000) of the administration staff at a
TAFE college.
Annual wages of TAFE administrative staff
10 20 30 40 50 60 70 80
Wages (× $1000)
a One of the wages is an outlier which was not included in the box plot.
What is the outlier?
b What is the median wage?
c Excluding the outlier, what is the range of wages?
d Including the outlier, what is the range of wages?
e Between what two amounts are the middle 50% of staff wages?
f What percentage of the staff earn less than $28 000?
Geography
Modern History
35 40 45 50 55 60 65 70 75 80 85 90 95
Marks
8 Year 12 students at Baramvale High had their pulse taken. The results are as follows.
Male 106 70 69 58 60 68 64 63 75 70 84 88 59 60 66
Female 68 74 59 75 74 82 82 71 120 55 77 91 73 60 79
a Find the five-number summary for each group and draw a parallel boxplot to show
the information.
b Find the range and interquartile range for each group.
c Compare the spread between the two groups. Are there significant differences
between the pulse rates for males and females?
d Which group had the lower pulse rates. Give reasons.
9 The box plot shows the results of tests in Physics and Chemistry.
Physics
Chemistry
30 40 50 60 70 80 90
Marks
In Chemistry, 48 students completed the yearly exam and the number of students who
scored above 50 or more was the same for both subjects.
Standard deviation is a better measure of spread than the range and interquartile range
Statistical
Homework
because, like the mean, its value depends on every score in the data set. Standard deviation calculations
measures how different each score in a data set is from the mean.
The formula for calculating standard deviation is quite complicated, and does not need to be
learnt. Instead, you can use your calculator’s statistics mode.
EXAMPLE 19
Calculate, correct to one decimal place, the standard deviation of each data set below.
a The maximum daily temperature (in °C) in Mudgee for the first two weeks in
January:
30 28 26 31 34 35 32
33 21 25 28 32 32 35
36 37 38 39 40 41 42 °C
Patients’ temperatures
a σ = 3.9434… ≈ 3.9
Operation Casio Scientific Sharp Scientific
Refer to page 404 to enter the data.
Calculate the population standard deviation SHIFT 1 Var sx = RCL σx
(σx = 3.9434…)
b σ = 1.5362…
≈ 1.5
To calculate the standard deviation of data presented in a frequency table, refer to the table of
calculator instructions on page 407, then follow the instructions from part a above.
EXAMPLE 20
Thirty-six people were given a concentration task and the time taken (in seconds) to
complete the exercise are shown below.
Males 32 44 44 29 40 26 64 21 65 32 42 30 66 51 53 30 55 42
Females 35 35 41 41 49 38 33 44 36 53 28 42 37 35 28 54 60 61
Solution
EXAMPLE 21
18 19 18 17 20 20 24 15 24 19
15 35 15 24 22 19 15 17 23 29
15 40 21 17 20 22 23 21 24 23
22 16 36 15 16 24 16 15 19 15
34 19 45 20 15 21 24 27 19 33
18 27 15 30 15 34 17 29 25 17
a Find, correct to one decimal place, the population mean (µ) and population standard
deviation (σ) of the Burger Haven employees.
b Randomly select three samples of ten ages from this population of employees and for
each sample, calculate (correct to one decimal place) the mean (x ) and the standard
deviation (s).
c Estimate the mean and standard deviation of the population from the statistics of the
three samples.
d How do the estimates of population mean and standard deviation compare with the
answers in part a?
Solution
d The estimates to the population mean and standard deviation (21.5 and 6.7) compare
favourably with the population mean and standard deviation (21.9 and 6.8).
Example
1 The number of monthly accidents at a construction site over 8 months was:
19 3 0 4 2 3 0 2 2
a Calculate the mean number of accidents per month.
b Find the standard deviation for the data, correct to one decimal place.
2 An express train from Central Station was late in arriving at Homebush by the following
times (in minutes):
6 0 3 −2 5 −1 0 3 −1 6 7 1
5 Students were surveyed on the number of movies they had Score (x) Frequency (f )
downloaded in the last six months, with the results shown
0 6
in the frequency table.
1 7
a For this data, find the mean, x .
2 8
b Calculate, correct to one decimal place, the standard
3 10
deviation.
4 9
c How many scores were within one standard deviation
of the mean? 5 5
7 This table shows the weekly wages of Weekly wage ($) Class centre Frequency
employees at Great Gals electrical store, $500 – < $600 7
grouped in classes of $100.
$600 – < $700 20
a Copy and complete the table.
$700 – < $800 36
b Find, to the nearest cent, an estimate
$800 – < $900 17
for:
$900 – < $1000 11
i the mean
ii the standard deviation. $1000 – < $1100 3
a Find the mean height and sample standard deviation for males and for females.
b Is there a significant difference between the heights of males and females? Give reasons.
9 The results of the first two Maths tests given to Test 1 Test 2
a Year 11 class are displayed in the back-to-back
4 3 2
stem-and-leaf plot.
4 3 4 9
a Find the mean mark and standard
9 8 0 5 2 7 9
deviation for each test.
9 8 7 4 0 6
b Are there significant differences between
the means and standard deviations of the 9 7 5 5 5 3 1 7 0 1 1 2 4 4 8
two tests? 9 9 8 0 1 2 4 5 5 7 8
c In which test did the students perform
better? Justify your answer.
10 A group of men and women were timed on the length of time (in seconds) of the last call
they made on their mobile phone.
Example 11 a As in Example 21, randomly select three samples of ten ages from the population
21 of Burger Haven employees and, for each sample, calculate the mean (x ) and the
sample standard deviation (s).
b Estimate the mean and standard deviation of the population from the statistics of
the three samples.
c How do the estimates of population mean and standard deviation compare with the
answers in part a?
12 a Randomly select three samples of five ages from the Burger Haven employees and,
for each sample, calculate the mean (x) and the sample standard deviation (s).
b Estimate the mean and standard deviation of the population from the statistics of
the three samples.
c How do the estimates of population mean and standard deviation compare with the
answers in part a?
TECHNOLOGY
Calculating measures of spread
Step 1: Open a blank spreadsheet and enter the temperature data about Mudgee from
Example 19 on page 445.
Step 2: In cell E5, enter the formula =MAX(A2:G3) to calculate the
highest score (35).
Step 3: In cell E6, enter =MIN(A2:G3) to calculate the lowest score (21).
Step 4: In cell E7,
enter =QUARTILE(A2:G3,3) Note: A spreadsheet calculates
to calculate the upper quartiles using a slightly different
method to the method we have
quartile, Q3 (32.75).
described, so its answers for the
Step 5: In cell E8, enter interquartile range may not be exactly
the same as ours, but they should be
=QUARTILE(A2:G3,1) to calculate close.
the lower quartile, Q1 (28).
Step 6: In cell E10, enter =E5-E6 to calculate the range (14).
Step 7: In cell E11, enter =E7-E8 to calculate the interquartile range (4.75).
Step 8: In cell E12, enter =STDEV.P(A2:G3) to calculate the population standard
deviation.
Frequency
distribution
graph or display.
Frequency
and median.
Frequency
and median.
One example of a negatively skewed distribution is the heights of
the players in a basketball team. Mean Mode Score
Median
Peaks are the high points of the distribution and represent the
more frequent scores. The highest peak is the mode.
The modality is the number of peaks occurring in a distribution. A distribution can have one
peak only (unimodal) or have more than one peak (multimodal).
Frequency
Frequency
Score Score
Unimodal distribution Multimodal distribution
If a distribution is bimodal, it has two peaks. For example, this frequency histogram is
bimodal, having two peaks at 2 and 7. The mode, however, is 7.
EXAMPLE 22
Frequency
40
30
2 3 4 5 6 7 8 9 10 20
Age 10
0
0–4
5–9
10–14
15–19
20–24
25–29
30–34
35–39
40–44
45–49
50–54
55–59
60–64
65–69
70–74
75–79
80+
Age
15 20 25 30 35 40
Waiting time (min)
Solution
c i symmetrical
ii multimodal, peaks at 3,5,7 and 9
iii no clusters
d i positively skewed (tail points towards the higher ages)
ii unimodal class, 1 peak
iii cluster from 15 to 29
e i positively skewed (tail points towards the right)
ii Unable to determine since individual scores are not known.
iii cluster from 15-17 min (25% of patients)
EXAMPLE 23 WS
Comparing
Homework
The daily maximum temperatures for Sydney and Brisbane for December are word lengths
shown below.
WS
Sydney
Comparing
Homework
sports scores
18 20 22 24 26 28 30 32 34 36 38 40
Temperature (°C)
Brisbane
18 20 22 24 26 28 30 32 34 36 38 40
Temperature (°C)
a Find the mean, the median and modal temperatures for each city.
b Find the range, interquartile range and standard deviation for each city.
c Describe the shape of the distribution of temperatures for each city and identify any
outliers and clusters.
d Compare the temperatures in Sydney and Brisbane. Comment on measures of central
tendency and measures of spread.
Solution
d Brisbane is the warmer city as shown by the mean, median and mode which are 2–3º
above those of Sydney.
The spread of Sydney’s temperatures is significantly greater than Brisbane’s as shown
by larger values of the range, interquartile range and standard deviation. Sydney also
had the lowest and the highest temperatures in December.
0 1 2 3 4 5 6 7 8 9 10
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11
Score
c d
Stem Leaf
1 3 4 6 6 6 7 8 9 9
2 0 7
3 1 2 2 5 7 8 8 9
4 0 2 3 10 11 12 13 14 15 16 17 18 19 20
5 2 9
e f 9
Stem Leaf 8
7
Frequency
4 1 3 6
5
4
5 5 5 6 3
2
6 0 3 5 5 6 8 1
0
5 10 15 20 25 30 35 40 45 50
7 2 6
Score
8 5 5 8
Hour 1201– 1301– 1401– 1501– 1601– 1701– 1801– 1901– 2001–
1300 1400 1500 1600 1700 1800 1900 2000 2100
Hits 1300 800 400 2100 2500 4500 3900 5300 2300
X Y
3 4 5 6 7 3 4 5 6 7
A Y is positively skewed.
B X does not have a mode.
C The mean of Y is 5.
D X and Y are both symmetrical.
6 These are the ages of employees at the Berry Good Biscuit factory.
16 36 15 16 15 19 55 59 18 20 50 22 21 35 22 19 15 17 43 49
a Draw a stem-and-leaf plot for this data.
b Comment on the shape of the distribution, mentioning skewness, peaks and
clusters.
7 This dot plot represents the number of accidents per month at a factory over a year.
0 1 2 3 4 5 6 7 8 9
Accidents/month
9 The results of a Year 12 Maths exam are shown on the parallel box plot
below.
12W
12X
20 30 40 50 60 70 80
Test results
10 A Year 11 Biology class was asked to estimate their test results before completing the
test. The estimates and actual test results are shown below.
Estimates 87 80 83 65 82 82 92 73 82 89
93 77 70 65 85 33 87 77 78 75
88 89 86 58
Test results 80 73 86 52 91 91 72 64 91 87
79 46 78 85 82 32 87 73 79 86
95 79 49 73
Study tip
Looking after yourself
• While studying, don’t forget to keep it all in perspective.
• Remember to have your own life outside school.
• Look after your physical and mental health.
• Eat properly and have enough sleep.
• Exercise regularly, play sport and go out.
• Plan to do nothing occasionally.
• Relax and rest regularly.
• Talk to your family, visit your friends.
• Be positive and sensible.
• Have confidence in yourself and don’t stress.
• Don’t worry, be happy.
This chapter, Analysing data, examined the statistical measures of central tendency (mean,
median, mode) and spread (range, interquartile range, standard deviation). You should WS
be competent at making statistical calculations on sets of numerical data, including those Statistics
Homework
represented in frequency tables, class intervals (grouped data), dot plots and stem-and- review
leaf plots. Make sure you know how to use the statistical functions of your calculator. You
should understand the new concepts of quantiles (quartiles, deciles and percentiles), be PS
able to interpret cumulative frequency graphs and construct box plots using a five-number Statistics
summary. You must also be able to describe, compare and interpret data sets in terms of crossword
modality, shape (symmetrical and skewness), measures of central tendency and spread and
also look at the effect of outliers.
Make a summary of this topic. Use the outline at the start of this chapter as a guide. An
incomplete mind map is shown below. Use your own words, symbols, diagrams, boxes and
reminders. Gain a ‘whole picture’ view of the topic and identify any weak areas.
Quantiles:
deciles,
quartiles
and
percentiles Measures of
Measures of
central spread and
tendency outliers
ANALYSING
DATA
Shape of
data sets
Box plots
Cumulative Comparing
frequency data sets
graphs
Exercise
1 The heights (in centimetres) of a group of ballet dancers are:
10.01 165 183 170 168 175 179 168 170
181 168 172 177 171 170 175 179
a Calculate the mean, correct to one decimal place.
b Find the median height.
c What is the mode?
Exercise
2 Motor vehicles were clocked, by police radar, travelling at the following
10.01 speeds (in km/h):
78 95 64 77 81 84 77 89 90 78
79 80 82 84 80 79 95 86 84 70
78 65 82 91 89 60 85 81 78 68
90 84 69 70 80 91 85 84 80 76
68 65 85 76 79 83 82 91 84 80
a Sort the data in a frequency table using classes of 60–< 70, 70–< 80, and so on, and
include a column of class centres.
b Calculate an estimate for the mean speed.
c Find the median class of speeds.
d What is the modal class?
Exercise 5 Which measure of central tendency is most appropriate for describing each average
10.01 below? Give a reason for each answer.
a The average men’s shoe size
b The average height of Year 11 students
c The average starting salary of an Australian worker
10.02
a This score was above the 7th decile, D7. Approximately what percentage of students
taking the test scored lower than her?
b More specifically, Simone’s score was at the 78th percentile, P78. What percentage
of students scored higher than her?
b A random sample of 15 packets of corn chips had the following masses in grams. 10.03
Find the range and interquartile range of these masses.
52 51 50 49 50 50 48 51
51 50 49 53 50 49 51
9 This stem-and-leaf plot on the right represents the Stem Leaf Exercise
10 In a small business, eight employees earn the following wages per week. Exercise
10.04
$1026 $874 $950 $950 $980 $1140 $1216 $1710
Is the wage of $1710 an outlier for this set of data? Justify your answer with calculation.
4 7 8 8 12 15 19 20 10.04
the assignment? 28
b Use the graph to estimate: Cumulative frequency
24
i the median
ii the interquartile range 20
iii the 6th decile 16
iv the 45th percentile.
12
2 3 4 5 6 7 8 9 10
Mark
Exercise
15 a Create a five-number summary for the corn chip packet masses in Question 8b.
10.06 b Represent the mass data on a box plot.
English
History
10 20 30 40 50 60 70 80 90 100
Marks
17 For quality testing, a manufacturer takes a random sample of 10 screws, each designed to Exercise
have a length of 2 cm. The actual lengths of the screws, in centimetres, are: 10.07
2.00 1.99 1.98 2.01 2.01 1.97 2.03 1.98 2.01 2.00
a Find the mean screw length.
b Find the standard deviation, correct to two decimal places.
18 For the shoe data from Question 12, calculate (correct to one decimal place): Exercise
10.07
a the mean b the standard deviation.
19 The results for the multiple-choice section in two tests taken by a Year 11 Mathematics Exercise
Test 1 10 Test 2
9
8
7
6
5
4
3
2
1
10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10
Frequency