c12UnivariateData (1) (1)
c12UnivariateData (1) (1)
LESSON SEQUENCE
12.1 Overview ...............................................................................................................................................................860
12.2 Measures of central tendency ......................................................................................................................865
12.3 Measures of spread ......................................................................................................................................... 879
12.4 Box plots .............................................................................................................................................................. 886
12.5 The standard deviation (Optional) ...............................................................................................................897
12.6 Comparing data sets ....................................................................................................................................... 910
12.7 Populations and samples .............................................................................................................................. 918
12.8 Evaluating inquiry methods and statistical reports .............................................................................. 926
12.9 Review ................................................................................................................................................................... 940
LESSON
12.1 Overview
Why learn this?
According to the novelist Mark Twain, ‘There are three kinds of lies: lies, damned lies
and statistics.’ Statistics can easily be used to manipulate people unless they have an
understanding of the basic concepts involved.
Statistics, when used properly, can be an invaluable aid to good decision-making.
However, deliberate distortion of the data or meaningless pictures can be used to
support almost any claim or point of view. Whenever you read an advertisement, hear
a news report or are given some data by a friend, you need to have a healthy degree
of scepticism about the reliability of the source and nature of the data presented. A
solid understanding of statistics is crucially important, as it is very easy to fall prey to
statistics that are designed to confuse and mislead.
In 2020 when the COVID-19 pandemic hit, news and all forms of media were flooded with statistics. These
statistics were used to inform governments worldwide about infection rates, recovery rates and all sorts of other
important information. These statistics guided the decision-making process in determining the restrictions that
were imposed or relaxed to maintain a safe community.
Statistics are also used to provide more information about a population in order to inform government policies.
For example, the results of a census might indicate that the people in a particular city are fed up with traffic
congestion. With this information now known, the government might prioritise works on public roads, or
increase funding of public transport to try to create a more viable alternative to driving.
Questions with
immediate
feedback, and
fully worked
solutions to help
students get
unstuck
1. The following data show the number of cars in each of the 12 houses along a street.
2, 3, 3, 2, 2, 3, 2, 4, 3, 1, 1, 0
2. Calculate the range of the following data set: 5, 15, 23, 6, 31, 24, 26, 14, 12, 34, 18, 9, 17, 32.
3. The frequency table shows the scores obtained by 102 professional golfers in the final round of
a tournament.
Score Frequency
67 2
68 6
69 7
70 11
71 16
72 23
73 17
74 11
75 9
4. A sample of 15 people was selected at random from those attending a local swimming pool. Their ages
(in years) were recorded as follows.
19, 7, 83, 41, 17, 23, 62, 55, 15, 25, 32, 29, 11, 18, 10
Calculate the mean age of people attending the swimming pool, correct to 1 decimal place.
6. At Einstein Secondary School a Year 10 mathematics class has 22 students. The following were the test
scores for the class.
34, 47, 54, 59, 60, 63, 66, 69, 73, 77, 78
78, 79, 80, 82, 83, 85, 86, 88, 89, 90, 91
7. The mean of a set of five scores is 11.8. If four of the scores are 17, 9, 14 and 6, calculate the fifth score.
6 7 8 9 10 11 12 13 14
9. A frequency table for the time taken by 20 people to put together an item of flat-pack furniture
is shown.
Calculate the cumulative frequency to put together an item of flat-pack furniture in less than
20 minutes.
10. MC The frequency table below shows the scores obtained by 100 professional golfers in the final round
of a tournament.
Score Frequency
67 2
68 5
69 8
70 11
71 16
72 22
73 14
74 13
75 9
0 2 2 2 1 1 3 4 4 2 1
2 4 1 6 3 3 5 4 1 2 5
13. MC Select the approximate median in the cumulative frequency percentage graph shown.
100
90
Cumulative frequency
80
70
60
50
40
30
20
10
0
10 20 30 40 50 60 70
Mass (g)
A. 30 B. 32 C. 40 D. 50 E. 92
14. MC The following back-to-back stem-and-leaf plot shows the typing speed in words per minute (wpm)
990
985
980
Number of votes
975
970
965
960
955
950
Hamburgers Pizza Chicken
wings
Favourite food
Graph 2
Favourite take-away foods
1200
1000
Number of votes
800
600
400
200
0
Hamburgers Pizza Chicken
wings
Favourite food
The mean
• The mean of a set of data is what is referred to in everyday language as the average.
• The mean of a set of values is the sum of all the values divided by the number of values.
• The symbol we use to represent the mean is x; that is, a lower-case x with a bar on top.
x= ∑
x
n
The median
• The median represents the middle score when the data values are in ascending order such that an equal
number of data values will lie below the median and above it.
n+1
1. Arrange the data values in order.
( )
2. The position of the median is the th data value, where n is the total number of data values.
2
Note: If there are an even number of data values, there will be two middle values. In this case the
median is the average of those data values.
1 1 3 4 6 7 8
median = 4
• When there are an even number of data values, the median is the average of the two middle values.
2 3 3 5 6 6 7 9
5+6
median = = 5.5
2
The mode
• The mode is the observation that occurs most often.
• The data set can have no modes, one mode, two modes (bimodal) or more than two modes (multimodal).
• If no value in a data set appears more than once then there is no mode.
• If a data set has multiple values that appear the most then it has multiple modes. All values that appear the
most are modes.
For example, the set 1, 2, 2, 4, 5, 5, 7 has two modes, 2 and 5.
= 33
n=8
∑x
2. Count the number of scores; that is, n.
= 4.125
8
5. Evaluate.
6. Write the answer. The mean is 4.125.
Median =
n+1
, where n = 8. This places the
2. Locate the position of the median using th score
8+1
2
=
the rule
2 th score
= 4.5 th score
median as the 4.5th score; that is, between the 2
4th and 5th score.
2, 3, 4, 4, 4, 5, 5, 6
4+4
3. Obtain the average of the two middle scores. Median =
2
=
8
=4
2
↓ ↓
4. Write the answer The median is 4.
c. 1. Systematically work through the set and make c.
↑ ↑ ↑
note of any repeated values (scores). Identify 2 3 4 4 4 5 5 6
the most frequently occurring observation.
2. Write the answer. The mode is 4.
n+1
• To calculate the median, add a cumulative frequency column to the table and use it to determine the score
( )
that is the th data value.
2
• To calculate the mean, add a column that is the score multiplied by its frequency f × x. The following
formula can then be used to calculate the mean, where ∑ ( f × x) is the sum of the ( f × x) column. ∑ is
the uppercase Greek letter sigma.
THINK WRITE
1. Rule up a table with four columns
× score
Frequency Cumulative
Frequency × score ( f × x) and
titled Score (x), Frequency ( f ),
( f × x)
frequency
Score (x) Frequency ( f) (cf)
Cumulative frequency (cf ).
3+5=8
5 2 10
8 + 4 = 12
6 5 30
12 + 3 = 15
7 4 28
n = 15 ∑( f × x) = 96
8 3 24
x=
96
2. Substitute the known values into the
= 6.4
rule and evaluate. 15
15 + 1
( )
n+1
b. 1. Locate the position of the median b. The median is the th or 8th score.
, where n = 15.
2
using the rule
2
This places the median as the 8th
score.
2. Use the cumulative frequency column The median of the data set is 6.
to find the 8th score and write the
answer.
c. 1. The mode is the score with the c. The score with the highest frequency is 6.
highest frequency.
2. Write the answer. The mode of the data set is 6.
Mean
• The formula for calculating the mean is the same as the formula used when the data is displayed in a
frequency distribution table:
∑( f × x)
x=
n
Here, x represents the midpoint (or class centre) of each class interval, f is the corresponding frequency and
n is the total number of observations in a set.
Median
• The median is found by drawing a cumulative frequency curve (ogive) of the data and estimating the
median from the 50th percentile (see section 12.2.3).
Modal class
• The modal class is the class interval that has the highest frequency.
• A percentile is named after the percentage of data that lies at or below that value.
For example, 60% of the data values lie at or below the 60th percentile.
• Percentiles can be read off a percentage cumulative frequency curve.
• A percentage cumulative frequency curve is created by:
• writing the cumulative frequencies as a percentage of the total number of data values
• plotting the percentage cumulative frequencies against the maximum value for each interval.
• For example, the following table and graph show the mass of cartons of eggs ranging from 55 g to 65 g.
Percentage cumulative
55− < 57
Mass (g) Frequency ( f) Cumulative frequency (cf) frequency (%cf)
59− < 61 8 + 12 = 20
6 22%
61− < 63 20 + 11 = 31
12 56%
63− < 65 31 + 5 = 36
11 86%
5 100%
Percentage cumulative frequency
100
90
80
70
60
50
40
30
20
10
0 55 56 57 58 59 60 61 62 63 64 65 66
Mass (g)
60 − < 70
Class interval Frequency
70 − < 80
5
80 − < 90
7
90 − < 100
10
a. estimate the mean b. estimate the median c. determine the modal class.
THINK WRITE
Frequency ×
1. Draw up a table with 5
columns headed Class Cumulative
( f × x)
interval, Class centre (x), Class Class class centre frequency
60− < 70
70− < 80
65 5 325 5
2. Complete the x, f, f × x
Cumulative frequency (cf ).
80− < 90
75 7 525 12
n = 45 ∑( f × x) = 4025
115 3 345 45
∑( f × x)
a. 1. Write the rule for the a. x =
mean. n
x=
4025
2. Substitute the known
≃ 89.4
values into the rule and 45
evaluate.
3. Write the answer. The mean for the given data is approximately 89.4.
b. 1. Draw a combined b.
cumulative frequency 45
histogram and ogive, 40
Cumulative frequency
0
65 75 85 95 105 115
Data
Cumulative frequency
35
from this point to the
30
ogive and a vertical line
to the horizontal axis. 25
20
15
10
5
0
65 75 85 95 105 115
Data
4. Read off the value of the The median for the given data is approximately 90.
median from the x-axis
c. The class internal 90– < 100 occurs twelve times, which is the highest
and write the answer.
c. 1. The modal class is the
class interval with the frequency.
Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Video eLesson Mean and median (eles-1905)
Interactivities Individual pathway interactivity: Measures of central tendency (int-4621)
Mean (int-3818)
Median (int-3819)
Mode (int-3820)
Ogives (int-6174)
Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 2, 7, 10, 14, 17, 18, 23 3, 4, 6, 8, 11, 15, 19, 20, 24 5, 9, 12, 13, 16, 21, 22, 25
Fluency
WE1 For questions 1 to 5, calculate:
2. 4, 6, 7, 4, 8, 9, 7, 10
5. 7 , 10 , 12, 12 , 13, 13 , 13 , 14
1 1 1 1 1
2 4 4 2 2
6. The back-to-back stem-and-leaf plot shows the test results of
25 Year 10 students in Mathematics and Science. Calculate the
mean, median and mode for each of the two subjects.
Key: 3 ∣ 2 = 32
Leaf: Science Stem Leaf: Mathematics
873 3 29
96221 4 068
876110 5 135
97432 6 2679
8510 7 3678
73 8 044689
9 258
40 − < 50
Class interval Frequency
50 − < 60
4
60 − < 70
4
70 − < 80
6
80 − < 90
9
90 − < 100
5
4
Total 32
11. Calculate the mean of the grouped data shown in the table below.
50 – < 55
Class interval Frequency
55 − < 60
1
60 − < 65
3
65 − < 70
4
70 − < 75
5
75 − < 80
3
2
Total 18
Number of
books sold Frequency
220–229 2
230–239 2
240–249 3
250–259 5
260–269 4
270–279 4
Total 20
Understanding
14. A random sample was taken, composed of 30 people shopping at a supermarket on a Tuesday night. The
amount of money (to the nearest dollar) spent by each person was recorded as follows.
6 32 66 17 45 1 19 52 36 23 28 20 7 47 39
6 68 28 54 9 10 58 40 12 25 49 74 63 41 13
a. Calculate the mean and median amount of money spent at the checkout by the people in this sample.
b. Group the data into class intervals of 10 and complete the frequency distribution table. Use this table to
estimate the mean amount of money spent.
c. Add the cumulative frequency column to your table and fill it in. Hence, construct the ogive.
Use the ogive to estimate the median.
d. Compare the mean and the median of the original data from part a with the mean and the median obtained
for grouped data in parts b and c.
Explain if the estimates obtained in parts b and c were good enough.
15. Answer the following question and show your working.
a. Add one more number to the set of data 3, 4, 4, 6 so that the mean of a new set is equal to its median.
b. Design a set of five numbers so that mean = median = mode = 5.
c. In the set of numbers 2, 5, 8, 10, 15, change one number so that the median remains unchanged while the
mean increases by 1.
Reasoning
17. The data shown give the age of 25 patients admitted to the emergency ward of a hospital.
a. Present the data in a frequency distribution table. (Use class intervals of 0 − < 15, 15 − < 30 and so on.)
b. Draw a histogram of the data.
c. Suggest a word to describe the pattern of the data in this distribution.
d. Use your table to estimate the mean age of patients admitted.
e. Determine the median class for age of patients admitted.
f. Identify the modal class for age of patients admitted.
g. Draw an ogive of the data.
h. Use the ogive to determine the median age.
i. Explain if any of your statistics (mean, median or mode) give a clear representation of the typical age of an
emergency ward patient.
j. Give some reasons which could explain the pattern of the distribution of data in this question.
18. MC In a set of data there is one score that is extremely small when compared to all the others.
Number of
Position Salary ($) employees
Machine operator 18 000 50
Machine mechanic 20 000 15
Floor steward 24 000 10
Manager 62 000 4
Chief executive officer 80 000 1
a. Workers are arguing for a pay rise but the management of the factory
21. The resting pulse rate of 20 female athletes was measured. The results are detailed below.
50, 52, 48, 52, 71, 61, 30, 45, 42, 48,
43, 47, 51, 62, 34, 61, 44, 54, 38, 40
a. Construct a frequency distribution table. (Use class sizes of 1 –< 10, 10 –< 20 and so on.)
b. Use your table to estimate the mean of the data.
c. Determine the median class of the data.
d. Identify the modal class of the data.
e. Draw an ogive of the data. (You may like to use a graphics calculator for this.)
f. Use the ogive to determine the median pulse rate.
Problem solving
23. The numbers 15, a, 17, b, 22, c, 10 and d have a mean of 14. Calculate the mean of a, b, c and d.
24. The numbers m, n, p, q, r, and s have a mean of a while x, y and z have a mean of b. Calculate the mean of
all nine numbers.
25. The mean and median of six two-digit prime numbers is 39 and the mode is 31. The smallest number is 13.
Determine the six numbers.
Mean = 50 Mean = 50
Median = 50 Median = 50
Newcastle: Wollongong:
No mode No mode
With these measures being the same for both data sets we could come to the conclusion that both data sets
are very similar; however, if we look at the data sets, they are very different. We can see that the data for
Newcastle are very clustered around the mean, whereas the Wollongong data are spread out more.
The data from Newcastle are between 40 and 60, whereas the Wollongong data are between 15 and 90.
• Range and interquartile range (IQR) are both measures of spread.
Range
• The most basic measure of spread is the range.
• The range is defined as the difference between the highest and the lowest values in the set of data.
= Xmax − Xmin
Calculate the range of the given data set: 2.1, 3.5, 3.9, 4.0, 4.7, 4.8, 5.2.
= 5.2 − 2.1
3. Write the rule for the range.
= 3.1
4. Substitute the known values into the rule.
5. Evaluate and write the answer.
Q2
Minimum Median Maximum
Q1 Q3
Lower quartile Upper quartile
• The lower quartile (Q1 ) is the median of the lower half of the data set.
• The upper quartile (Q3 ) is the median of the upper half of the data set.
• The IQR is not affected by extremely large or extremely small data values (outliers), so in some
circumstances the IQR is a better indicator of the spread of data than the range.
3, 2, 8, 6, 1, 5, 3, 7, 6.
THINK WRITE
1, 2, 3, 3, 5 , 6, 6, 7, 8
1. Arrange the scores in order. 1, 2, 3, 3, 5, 6, 6, 7, 8
2. Locate the median and use it to divide the data
into two halves.
Note: The median is the 5th score in this data
set and should not be included in the lower or
2+3
upper ends of the data.
= 2.5
2
= 6.5
2
IQR = Q3 − Q1
= 6.5 − 2.5
5. Calculate the interquartile range.
The IQR = Q3 − Q1
page select ‘xvalues’ as
= 6.5 − 2.5 = 4
the X1 List and leave
the Frequency as 1.
Leave the remaining
fields empty, TAB to
OK and press ENTER.
The summary statistics
are shown.
The following frequency distribution table gives the number of customers who order different
volumes of concrete from a readymix concrete company during the course of a day.
Calculate the interquartile range of the data.
THINK WRITE/DRAW
1. To calculate the 25th and 75th percentiles from
0 5 5 5 5 5 5
0.2 0.7 1.2 1.7 2.2 2.7
Volume (m3)
Q3 = 1.6 m3
Q1 = 0.4 m3
3. Identify the upper quartile (75th percentile)
and lower quartile (25th percentile) from
IQR = Q3 − Q1
the ogive.
= 1.6 − 0.4
4. The interquartile range is the difference
= 1.2 m3
between the upper and lower quartiles.
Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Individual pathway interactivity: Measures of spread (int-4622)
Range (int-3822)
The interquartile range (int-4813)
Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 4, 7, 10, 13 2, 6, 8, 11, 14 3, 5, 9, 12, 15
Fluency
1. WE4 Calculate the range for each of the following sets of data.
a. 4, 3, 9, 12, 8, 17, 2, 16
b. 49.5, 13.7, 12.3, 36.5, 89.4, 27.8, 53.4, 66.8
1 3 1 2 1 3
c. 7 , 12 , 5 , 8 , 9 , 3
2 4 4 3 6 4
2. WE5 Calculate the interquartile range (IQR) for the following sets of data.
a. 3, 5, 8, 9, 12, 14
b. 7, 10, 11, 14, 17, 23
c. 66, 68, 68, 70, 71, 74, 79, 80
d. 19, 25, 72, 44, 68, 24, 51, 59, 36
3. The following stem-and-leaf plot shows the mass of newborn babies (rounded to the nearest 100g).
Calculate:
a. the range of the data b. the IQR of the data.
Key: 1∗ ∣ 9 = 1.9 kg
Stem Leaf
1* 9
2 24
2* 6 7 8 9
3 001234
3* 5 5 6 7 8 8 8 9
4 01344
4* 5 6 6 8 9
5 0122
4. Use the ogive shown to calculate the interquartile range of the data.
Cumulative frequency (%)
Cumulative frequency
50 100%
40
30
50%
20
10
0 0
100 120 140 160 180
Height (cm)
Christmas presents.
Time (h) 0 − < 0.5 0.5 − < 1 1 − < 1.5 1.5 − < 2 2 − < 2.5 2.5 − < 3 3 − < 3.5 3.5 − < 4
Frequency 1 2 7 15 13 8 2 2
Understanding
7. The following frequency distribution table shows the life expectancy in hours of 40 household batteries.
a. Draw an ogive curve that represents the data in the table above.
b. Use the ogive to answer the following questions.
i. Calculate the median score.
ii. Determine the upper and lower quartiles.
iii. Calculate the interquartile range.
iv. Identify the number of batteries that lasted less than 60 hours.
v. Identify the number of batteries that lasted 70 hours or more.
11. As newly appointed coach of Terrorolo’s Meteors netball team, Kate decided to record each player’s
statistics for the previous season. The number of goals scored by the leading goal shooter was:
1, 3, 8, 18, 19, 23, 25, 25, 25, 26, 27, 28,
28, 28, 28, 29, 29, 30, 30, 33, 35, 36, 37, 40
a. Determine the mean, median, range and interquartile range of each set.
b. Write a short paragraph comparing the two distributions.
Problem solving
13. Calculate the mean, median, mode, range and IQR of the following data
collected when the temperature of the soil around 25 germinating seedlings
was recorded:
28.9, 27.4, 23.6, 25.6, 21.1, 22.9, 29.6, 25.7, 27.4, 23.6, 22.4, 24.6, 21.8,
26.4, 24.9, 25.0, 23.5, 26.1, 23.6, 25.3, 29.5, 23.5, 22.0, 27.9, 23.6.
14. Four positive numbers a, b, c and d have a mean of 12, a median and mode
of 9 and a range of 14. Determine the values of a, b, c and d.
• range = 9
15. A set of five positive integer scores have the following summary statistics:
• median = 6
• Q1 = 3 and Q3 = 9.
a. Explain whether the five scores could be 1, 3, 6, 9 and 10.
b. A sixth score is added to the set. Determine whether there is a score that will maintain the summary
statistics given above. Justify your answer.
a. IQR = Q3 − Q1
THINK WRITE
= 44 − 37
a. The interquartile range is the difference between
=7
the upper and lower quartiles.
= 19
score and the lowest score.
The lowest score The lower quartile The median The upper quartile The largest score
Xmin Q1 M Q3 Xmax
0 5 10 15 20 25 30
Scale
0 5 10 15 20 25 30 35 40 45
Scale
Identifying outliers
Lower limit = Q1 − 1.5 × IQR
Upper limit = Q3 + 1.5 × IQR
Any scores that sit outside these limits are considered outliers.
Examples of a symmetrical stem-and-leaf plot and a symmetrical box plot are shown.
Stem Leaf
26* 6
27 013
27* 5689
28 011124
20 22 24 26 28 30
28* 5788
29 222
29* 5
The following stem-and-leaf plot gives the speed of 25 cars caught by a roadside speed camera.
a. Prepare a five-number summary of the data and draw a box plot to represent it.
b. Identify any outliers and redraw the box plot with outliers marked.
c. Describe the distribution of the data.
25 + 1
THINK WRITE
( )
1. First identify the positions of the median and The median is the th score — that is, the
n+1
2
12 + 1
upper and lower quartiles.( There are
) 25 data 13th score.
( )
values. The median is the th score. Q1 is the th score in the lower half — that
2 2
The lower quartile is the median of the lower half is, the 6.5th score. That is, halfway between the 6th
of the data. The upper quartile is the median of and 7th scores.
the upper half of the data (each half contains Q3 is halfway between the 6th and 7th scores in the
12 scores). upper half of the data.
Stem Leaf Q1
8 2 2 4 4 44 Median
8* 5 5 6 6 79 9 9
9 01124
9* 569
Q3
10 02
10*
11 4
b. IQR = Q3 − Q1
= 94.5 − 84.5
b. 1. Calculate the IQR.
= 10
minX = 82
line as: statistics.
Q1 = 84.5
FiveNumSummary cars
Med = 89
Press ENTER.
Q3 = 95.5
Then press VAR, select
maxX = 114
‘stat.results’ and press
ENTER.
b. b. b. b.
To construct the box-and- To construct the box-and-
whisker plot, open a Data whisker plot, tap:
& Statistics page. Press • SetGraph
TAB to locate the label • Setting...
of the horizontal axis and Set:
select the variable ‘cars’. • Type: MedBox
Then press: • XList: main\cars
• MENU • Freq: 1
• 1: Plot Type Then tap the graphing icon.
• 2: Box Plot The box-and-whisker plot is
To change the colour, place displayed. As you scroll over the
the pointer over one of the box-and-whisker plot, the values
data points. Then press of the five-number summary The box-and-whisker
CTRL MENU. statistics are displayed. The data plot is displayed.
Then press: are skewed (positively).
• 6: Color
• 2: Fill Color.
Select whichever colour
you like from the palette.
Press ENTER.
0 2 4 6 8 10
Amount of money ($)
• Both graphs indicate that the data is positively skewed and both graphs indicate the presence of the outlier.
However, the box plot provides a more concise summary of the centre and spread of the distribution.
15, 22, 14, 12, 21, 34, 19, 11, 13, 0, 16,
4, 23, 8, 12, 18, 24, 17, 14, 3, 10, 12,
9, 15, 20, 5, 19, 13, 17, 11, 16, 19, 24,
12, 7, 14, 17, 10, 14, 23
• The data are displayed in the histogram and box plot
shown. 16
• Both graphs indicate that the data is approximately 14
symmetrical with an outlier. The histogram clearly 12
Frequency
10
shows the frequencies of each class interval. Neither
8
graph displays the original values.
6
• The histogram does not give precise information
4
about the centre, but the distribution of the data is 2
visible.
• However, the box plot shows the presence of an 0 5 10 15 20 25 30 35
outlier and provides a summary of the centre and
spread of the distribution. ×
0 5 10 15 20 25 30 35
Number of minutes
Each member of a class was given a jelly snake to stretch. They each
measured the initial length of their snake to the nearest centimetre and
then slowly stretched the snake to make it as long as possible.
They then measured the maximum length of the snake by recording how
far it had stretched at the time it broke. The results were recorded in the
following table.
Initial length Stretched Initial length Stretched
(cm) length (cm) (cm) length (cm)
13 29 14 27
14 28 13 27
17 36 15 36
10 24 16 36
14 35 15 36
16 36 16 34
15 37 17 35
16 37 12 27
14 30 9 17
16 33 16 41
17 36 17 38
16 38 16 36
17 38 17 41
14 31 16 33
17 40 11 21
The above data was drawn on parallel box plots as shown below.
Stretched
Initial
Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 4, 6, 9, 12, 13, 16, 19 2, 5, 7, 10, 14, 17, 20 3, 8, 11, 15, 18, 21, 22
Fluency
1. WE7 From the following five-number summary, calculate:
Xmin Q1 Median Q3 Xmax
6 11 13 16 32
a. the interquartile range b. the range.
4. The box plot shows the distribution of final points scored by a football team over a season’s roster.
5 10 15 20 25 30
Score
6. MC The median of the data is:
A. 20 B. 23 C. 25 D. 31 E. 5
8. MC Select which of the following is not true of the data represented by the box plot.
A. One-quarter of the scores are between 5 and 20.
B. Half of the scores are between 20 and 25.
C. The lowest quarter of the data is spread over a wide range.
D. Most of the data are contained between the scores of 5 and 20.
E. One-third of the scores are between 5 and 20.
Understanding
9. The number of sales made each day by a salesperson is recorded over a 2-week period:
11. The stem-and-leaf plot shown details the age of 25 offenders who were Key: 1|8 = 18 years
WE8
13. Prepare comparative box plots for the following dot plots (using the same axis) and describe what each plot
reveals about the data.
a. Number of sick days taken by workers last year at factory A
0 1 2 3 4 5 6 7
b. Number of sick days taken by workers last year at factory B
0 2 4 6 8 10 12 14
14. An investigation into the transport needs of an outer suburb community recorded
the number of passengers boarding a bus during each of its journeys, as follows.
12, 43, 76, 24, 46, 24, 21, 46, 54, 109, 87, 23, 78,
37, 22, 139, 65, 78, 89, 52, 23, 30, 54, 56, 32, 66, 49
Display the data by constructing a histogram using class intervals of 20 and a
comparative box plot on the same axis.
15. WE12 At a weight-loss clinic, the following weights (in kilograms) were recorded before and
after treatment.
Before 75 80 75 140 77 89 97 123 128 95 152 92
After 69 66 72 118 74 83 89 117 105 81 134 85
Reasoning
16. Explain the advantages and disadvantages of box plots as a visual form of representing data.
18. The following data show the ages of 30 mothers upon the birth of their first baby.
22, 21, 18, 33, 17, 23, 22, 24, 24, 20,
25, 29, 32, 18, 19, 22, 23, 24, 28, 20,
31, 22, 19, 17, 23, 48, 25, 18, 23, 20
Problem solving
19. Sketch a histogram for the box plot shown.
20. Consider the box plot below, which shows the number of weekly sales of houses by two real estate agencies.
HJ Looker
0 1 2 3 4 5 6 7 8 9 10
Number of weekly sales
a. Determine the median number of weekly sales for each real estate agency.
b. State which agency had the greater range of sales. Justify your answer.
c. State which agency had the greater interquartile range of sales. Justify your answer.
d. State which agency performed better. Explain your answer.
21. Fifteen French restaurants were visited by three newspaper restaurant reviewers. The average price of a meal
for a single person was investigated. The following box plot shows the results.
Under 20
20–24
Age
25–29
LESSON
12.5 The standard deviation (Optional)
LEARNING INTENTIONS
At the end of this lesson you should be able to:
• calculate the standard deviation of a small data set by hand
• calculate the standard deviation using technology
• interpret the mean and standard deviation of data
• identify the effect of outliers on the standard deviation.
• Deviation is the difference between each data value and the mean (x − x). The standard deviation is
the mean.
(x − x)2
using the following formula.
𝜎= ∑
√
n
where x is the mean of the data values and n is the number of data values.
• A low standard deviation indicates that the data values tend to be close to the mean.
• A high standard deviation indicates that the data values tend to be spread out over a large range, away from
the mean.
• Standard deviation can be calculated using a scientific or graphics calculator, or it can be calculated from a
frequency table by following the steps below.
Step 1 Calculate the mean.
Step 2 Calculate the deviations.
Step 3 Square each deviation.
Step 4 Sum the squares.
Step 5 Divide the sum of the squares by the number of data values.
Step 6 Take the square root of the result.
The number of lollies in each of 8 packets is 11, 12, 13, 14, 16, 17, 18, 19.
Calculate the mean and standard deviation correct to 2 decimal places. Interpret the result.
11 + 12 + 13 + 14 + 16 + 17 + 18 + 19
THINK WRITE
= 15
8
−2
12
−1
13
14
16 1
17 2
18 3
19 4
Total
11 − 15 = −4
No. of lollies (x)
sum the results: ∑ (x − x) .
2
−3
2 11 16
−2
12 9
−1
13 4
14 1
16 1 1
17 2 4
18 3 9
19 4 16
∑(x − x) = 60
2
Total
∑ (x − x)
𝜎=
√
2
=
values, then take the square root of the result.
√
60
8
≈ 2.74 (correct to 2 decimal places)
5. Check the result using a calculator. The calculator returns an answer of 𝜎n = 2.73861.
The answer is confirmed.
6. Interpret the result. The average (mean) number of lollies in each pack
is 15 with a standard deviation of 2.74, which means
that the number of lollies in each pack differs from the
mean by an average of 2.74.
deviation is 𝜍 = 2.74.
to calculate the mean 15 and the population standard Tap OK.
• When calculating the standard deviation from a frequency table, the frequencies must be taken into
account. Therefore, the following formula is used.
∑ f(x − x)
𝜎=
√
2
Lucy’s scores in her last 12 games of golf were 87, 88, 88, 89, 90, 90, 90, 92, 93, 93, 95 and 97.
Calculate the mean score and the standard deviation correct to 2 decimal places. Interpret your
result.
THINK WRITE
1. To calculate the mean, Golf score (x) Frequency ( f) fx
first set up a frequency
87 1 87
table.
88 2 176
89 1 89
90 3 270
92 1 92
93 2 186
95 1 95
∑ f = 12 ∑ fx = 1092
97 1 97
Total
∑ fx
x=
∑f
2. Calculate the mean.
=
1092
= 91
12
−3
87 1 87
−2
another column to the 88 8 176
frequency table and
−1
89 1 89
complete.
90 3 270
92 1 92 1
93 2 186 2
95 1 95 4
∑ f = 12 ∑ fx = 1092
97 1 97 6
Total
(x − x) f(x − x)
4. Add another column to 2
87 − 91 = −4 1 × (−4)2 = 16
Golf score (x) Frequency ( f ) fx
the table and multiply
deviations, (x − x)2 , by −3
the square of the 87 1 87
−1
2
89 1 89 4
∑ f(x − x) .
Then sum the results: 90 3 270 3
2
92 1 92 1 1
93 2 186 2 8
95 1 95 4 16
∑ f = 12 ∑ fx = 1092
97 1 97 6 36
Total 102
∑ f(x − x)
𝜎=
√
2
=
formula.
√
102
12
≈ 2.92
(correct to 2 decimal places)
When the data value is greater than the mean (x > x), the deviation is positive.
•
•
• The negative and positive deviations cancel each other out; therefore, calculating the sum and average of
the deviations is not useful.
• By squaring all of the deviations, each deviation becomes positive, so the average of the deviations
This explains why the standard deviation is calculated using the squares of the deviations, (x − x) , for all
becomes meaningful.
2
data values.
f(x − x)
s= ∑
2
√
n−1
• Calculators usually display both values for the standard deviation, so it is important to understand the
difference between them.
x = 88.6154 𝜎 = 8.7225
THINK WRITE
≈ 88.62 ≈ 8.7225
1. Use a calculator to calculate the mean and
the standard deviation.
2. Interpret the result and compare it to the In the first 12 games Lucy’s mean score was 91 with
results found in Worked example 11. a standard deviation of 2.92. This implied that Lucy’s
scores on average were 2.92 either side of her average
of 91. Lucy’s latest performance resulted in a mean
score of 88.62 with a standard deviation of 8.72.
This indicates a slightly lower mean score, but the
much higher standard deviation indicates that the data
are now much more spread out.
• If all data values in a set are multiplied by a constant k, the deviations (x − x) will be multiplied by k, that is
consequently the standard deviation remains unchanged.
5 + 9 + 6 + 11 + 10 + 7
THINK WRITE
=8
6
(x − x) (x − x)2
2. Set up a frequency table and enter the squares
5 5 − 8 = −3
(x)
of the deviations.
−2
9
−1
6 4
7 1
9 1 1
10 2 4
∑ (x − x) = 28
11 3 9
2
Total
∑ (x − x)
𝜎=
√
2
=
√
28
9 + 13 + 10 + 15 + 14 + 11
b. 1. Add 4 to each data value in the set. b. 9, 13, 10, 15, 14, 11
(x − x) (x − x)
3. Set up a frequency table and enter the squares
9 9 − 12 = −3
2
(x)
of the deviations.
−2
9
−1
10 4
11 1
13 1 1
14 2 4
∑ (x − x) = 28
15 3 9
2
Total
∑ (x − x)
𝜎=
√
2
=
√
28
10 + 18 + 12 + 22 + 20 + 14
c. 1. Multiply each data value in the set by 2. c. 10, 18, 12, 22, 20, 14
(x − x) (x − x)2
3. Set up a frequency table and enter the squares
10 10 − 16 = −6
(x)
of the deviations.
−4
36
−2
12 16
14 4
18 2 4
20 4 16
∑ (x − x) = 112
22 6 36
2
Total
∑ (x − x)
𝜎= =
√
2 √
112
4. To calculate the standard deviation, apply the
5. Comment on the effect of multiplying each Multiplying each data value by 2 doubled
data value by 2. the mean and doubled the standard deviation,
which changed from 2.16 to 4.32.
THINK WRITE
a. 1. Put the data in ascending order. 1, 3, 4, 5, 17
2. Locate the median. 1, 3, 4, 5, 17
5 + 3 + 17 + 1 + 4
6. Write the answer. The median absolute deviation is 1.
=
5
30
=6
5
x |x − x|
2. Calculate the absolute deviations by calculating how
far away each data point is from the mean, using 5 1
positive distances. 3 3
17 11
1 5
4 2
1 + 3 + 11 + 5 + 2
3. Calculate the mean absolute deviation. MAD =
=
5
22
= 4.4
5
Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Individual pathway interactivity: The standard deviation (int-4624)
The standard deviation for a sample (int-4814)
Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 2, 6, 10, 11, 12, 15 3, 4, 7, 9, 13, 16 5, 8, 14, 17
Fluency
1. WE10 Calculate the standard deviation of each of the following data sets, correct to 2 decimal places.
a. 3, 5, 8, 2, 7, 1, 6, 5 b. 11, 8, 7, 12, 10, 11, 14
c. 25, 15, 78, 35, 56, 41, 17, 24 d. 5.2, 4.7, 5.1, 12.6, 4.8
2. WE14a Calculate the median absolute deviation for each of the data sets in question 1.
3. WE14b Calculate the mean absolute deviation for each of the following data sets, correct to 2 decimal places.
a. 3, 5, 8, 2, 7, 1, 6, 5
b. 25, 15, 78, 35, 56, 41, 17, 24
4. Calculate the standard deviation of each of the following data sets, correct to 2 decimal places.
5. Complete the following frequency distribution table and use it to calculate the standard deviation of the data
set, correct to 2 decimal places.
1.8, 1.95, 1.87, 1.77, 1.75, 1.79, 1.81, 1.83, 1.76, 1.80, 1.92, 1.87, 1.85, 1.83
Calculate the mean score and the standard deviation for this set of data. Express your answers correct to
2 decimal places.
8. Times (to the nearest tenth of a second) for the heats in the open 100-m sprint at the
Key: 11|0 = 11.0 s
school sports are given in the stem-and-leaf plot shown.
Stem Leaf
Calculate the standard deviation for this set of data and express your answer correct
to 2 decimal places. 11 0
11 2 3
11 4 4 5
11 6 6
11 8 8 9
12 0 1
12 2 2 3
12 4 4
12 6
12 9
9. The number of outgoing phone calls from an office each day over a 4-week period is shown in the stem-and-
leaf plot.
Key: 1|3 = 13 calls
Stem Leaf
0 89
1 3479
2 01377
3 34
4 15678
5 38
Calculate the standard deviation for this set of data and express your answer correct to 2 decimal places.
people who have made use of the service each day during this period is set out in
Stem Leaf
the stem-and-leaf plot shown.
The standard deviation (to 2 decimal places) of these data is: 0 24
0 779
A. 6.00
B. 6.34 1 014444
C. 6.47 1 5667889
D. 15.44 2 122333
E. 9.37 2 7
11. WE12 The speeds, in km/h, of the first 25 cars caught by a roadside speed camera on a particular day were:
82, 82, 84, 84, 84, 84, 85, 85, 85, 86, 86, 87, 89, 89, 89, 90, 91, 91, 92, 94, 95, 96, 99, 100, 102
The next car that passed the speed camera was travelling at 140 km/h.
Comment on the effect of the speed of this last car on the standard deviation for the data.
Reasoning
12. Explain what the standard deviation tells us about a set of data.
14. Show using an example the effect, if any, on the standard deviation of adding a data value to a set of data
that is equivalent to the mean.
Problem solving
15. If the mean for a set of data is 45 and the standard deviation is 6, determine how many standard deviations
above the mean is a data value of 57.
16. Five numbers a, b, c, d and e have a mean of 12 and a standard deviation of 4.
a. If each number is increased by 3, calculate the new mean and standard deviation.
b. If each number is multiplied by 3, calculate the new mean and standard deviation.
17. Twenty-five students sat a test and the results for 24 of the students are given in the following stem-and-leaf
plot.
Key: 1|2 = 12 marks
Stem Leaf
0 89
1 123789
2 23568
3 012468
4 02568
a. If the average mark for the test was 27.84, determine the mark obtained by the 25th student.
b. Determine how many students scored higher than the median score.
c. Calculate the standard deviation of the marks, giving your answer correct to 2 decimal places.
6 + 7 + 8 + 9 + 10
THINK WRITE
=8
the first set of data. 5
12 + 4 + 10 + 11 + 3
2. Calculate the mean of x2 =
=8
the second set of data. 5
𝜎1 =
√
deviation of the first set
≈ 1.41
of data. 5
≈ 3.74
deviation of the second 5
set of data.
• When multiple data displays are used to display similar sets of data, comparisons and conclusions can then
be drawn about the data.
• We can use back-to-back stem-and-leaf plots and parallel box plots to help compare statistics such as
the median, range and interquartile range.
Below are the scores achieved by two students in eight Mathematics tests throughout the year.
John: 45, 62, 64, 55, 58, 51, 59, 62
Penny: 84, 37, 45, 80, 74, 44, 46, 50
a. Identify the student who performed better over the eight tests. Justify your answer.
b. Identify the student who was more consistent over the eight tests. Justify your answer.
a. John: x = 57, 𝜎 = 6
THINK WRITE
John: x = 57, 𝜍 = 6
mean(penny) • Freq: 1
John: x = 57, 𝜍 = 6
after each entry to get a correct to 2 decimal places. and the y-values to Penny.
Penny: x = 57.5,
decimal approximation. Scroll down to see all the
𝜍 = 17.4
statistics.
correct to 2 decimal
places.
3. To draw the two box To draw the two box-and-
plots on the same Data whisker plots on the same
& Statistics page, press Statistics screen, tap:
TAB to locate the label • SetGraph
of the horizontal axis and • Setting...
select the variable ‘john’.
Set values as:
Then press: • Type: MedBox
• MENU
• XList: main\John
• 1: Plot Type
• Freq: 1
• 2: Box Plot Penny performed slightly better
overall as her mean mark was Tap 2 in the row of numbers
Then press:
at the top of the screen.
• MENU higher than John’s; however,
Set values as:
• 2: Plot Properties John was more consistent as his Penny performed
• Type: MedBox
• 5: Add X-variable and standard deviation was lower slightly better overall
• XList: main\Penny
select ‘penny’. than Penny’s. as her mean mark was
• Freq: 1
To change the colour, higher than John’s;
Tap Set.
place the pointer over however, John was
Tap SetGraph and tick
one of the data points. more consistent as his
StatGraph1 and StatGraph2.
Then press CTRL standard deviation was
Tap the graphing icon to
MENU. Then press: lower than Penny’s.
display the graphs.
• 6: Color
• 2: Fill Color
Select whichever colour
you like from the palette
for each of the box plots.
Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Individual pathway interactivity: Comparing data sets (int-4625)
Back-to-back stem plots (int-6252)
Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 5, 8, 10, 11, 17 2, 4, 6, 9, 12, 13, 18 3, 7, 14, 15, 16, 19, 20
Fluency
1. WE15 For the two sets of data, 65, 67, 61, 63, 62, 60 and 56, 70, 65, 72, 60, 55:
a. calculate the mean
b. calculate the standard deviation
c. comment on the similarities and differences.
2. A bank surveys the average morning and afternoon waiting times for customers. The figures were taken each
Monday to Friday in the morning and afternoon for one month. The stem-and-leaf plot below shows
the results.
a. Identify the median morning waiting time and the median afternoon waiting time.
b. Calculate the range for morning waiting times and the range for afternoon waiting times.
c. Use the information given in the display to comment about the average waiting time at the bank in the
morning compared with the afternoon.
3. In a class of 30 students there are 15 boys and 15 girls. Their heights are measured in metres and are
listed below.
Boys: 1.65, 1.71, 1.59, 1.74, 1.66, 1.69 1.72, 1.66, 1.65, 1.64, 1.68, 1.74, 1.57, 1.59, 1.60
Girls: 1.66, 1.69, 1.58, 1.55, 1.51, 1.56, 1.64, 1.69, 1.70, 1.57, 1.52, 1.58, 1.64, 1.68, 1.67
Display this information in a back-to-back stem-and-leaf plot and comment on their height distribution.
4. The stem-and-leaf plot at right is used to display the number of
Key: 1|5 = 15 vehicles
vehicles sold by the Ford and Hyundai dealerships in a Sydney
Leaf: Ford Stem Leaf: Hyundai
suburb each week for a three-month period.
74 0 39
a. State the median of both distributions.
952210 1 111668
b. Calculate the range of both distributions.
c. Calculate the interquartile range of both distributions. 8544 2 2279
d. Show both distributions on a box plot. 0 3 5
Sydney Swans
Brisbane Lions
Understanding
6. Tanya measures the heights (in m) of a group of Year
10 boys and girls and produces the following five-point
summaries for each data set.
Boys: 1.45, 1.56, 1.62, 1.70, 1.81
Girls: 1.50, 1.55, 1.62, 1.66, 1.73
a. Draw a box plot for both sets of data and display them
on the same scale.
b. Calculate the median of each distribution.
c. Calculate the range of each distribution.
d. Calculate the interquartile range for each distribution.
e. Comment on the spread of the heights among the boys
and the girls.
7. The box plots show the average daily sales of cold drinks at the
school canteen in summer and winter. Summer
Reasoning
10. WE16 Cory recorded his marks for each test that he did in English and Science throughout the year.
English: 55, 64, 59, 56, 62, 54, 65, 50
Science: 35, 75, 81 32, 37, 62, 77, 75
a. Identify the subject in which Cory received a better average. Justify your answer.
b. Identify the subject in which Cory performed more consistently. Justify your answer.
12. The police set up two radar speed checks on a back street of Sydney and on a main road. In both places the
speed limit is 60 km/h. The results of the first 10 cars that have their speed checked are given below.
Back street: 60, 62, 58, 55, 59, 56, 65, 70, 61, 64
Main road: 55, 58, 59, 50, 40, 90, 54, 62, 60, 60
a. Calculate the mean and standard deviation of the readings taken at each point.
b. Identify the road where drivers are generally driving faster. Justify your answer.
c. Identify the road where the spread of readings is greater. Justify your answer.
13. In boxes of Smarties it is advertised that there are 50 Smarties in each box. Two machines are used to
distribute the Smarties into the boxes. The results from a sample taken from each machine are shown in the
stem-and-leaf plot.
Key: 5|1 = 51 5 ∗ |6 = 56
Leaf: Machine A Stem Leaf: Machine B
4 4
99877665 4∗ 57899999999
43222111000000 5 0000011111223
55 5∗ 9
Group A (drug)
25, 29, 32, 45, 18, 21, 37, 42, 62, 13,
42, 38, 44, 42, 35, 47, 62, 17, 34, 32
Group B (placebo)
25, 17, 35, 42, 35, 28, 20, 32, 38, 35,
34, 32, 25, 18, 22, 28, 21, 24, 32, 36
a. Display the data on a back-to-back stem-and-leaf plot.
b. Display the data for both groups on a parallel box plot.
c. Make comparisons of the data. Use statistics in your answer.
d. Explain if the drug works. Justify your answer.
e. Determine other considerations that should be taken into account when trying to draw conclusions from
an experiment of this type.
18. Kloe compares her English and Maths marks. The results of eight tests in each subject are shown below.
English: 76, 64, 90, 67, 83, 60, 85, 37
Maths: 80, 56, 92, 84, 65, 58, 55, 62
a. Calculate Kloe’s mean mark in each subject.
b. Calculate the range of marks in each subject.
c. Calculate the standard deviation of marks in each subject.
d. Based on the above data, determine the subject that Kloe has performed more consistently in.
19. A sample of 50 students was surveyed on whether they owned an iPad or a mobile phone. The results
showed that 38 per cent of the students owned both.
Sixty per cent of the students owned a mobile phone and there were four students who had an iPad only.
Evaluate the percentage of students that did not own a mobile phone or an iPad.
20. The life expectancy of non-Aboriginal and non–Torres Strait Islander people in Australian states and
territories is shown on the box plot below.
70 75 80 85
Life expectancy of non-Aboriginal and non–Torres Strait
Islander people in Australian states and territories
The life expectancies of Aboriginal and Torres Strait Islander people in each of the Australian states and
territories are 56, 58.4, 51.3, 57.8, 53.9, 55.4 and 61.0.
a. Draw parallel box plots on the same axes. Compare and comment on your results.
b. Comment on the advantage and disadvantage of using a box plot.
12.7.1 Populations
eles-4946
• The term population refers to a complete set of individuals,
objects or events belonging to some category.
• When data are collected from a whole population, the
process is known as a census.
• It is often not possible, nor cost-effective, to conduct
a census.
• For this reason, samples have to be selected carefully
Population (size N)
from the population. A sample is a subset of a
population.
Sample
(size n)
THINK WRITE
Consider how the data might be collected Firstly, it would be almost impossible to find all of the possums
and the problems in obtaining these data. in a local area in order to count them.
Secondly, possums are likely to stray into neighbouring areas,
making it impossible to know if they belong to the area being
observed.
Thirdly, possums are more active at night-time, making it
harder to detect their presence.
12.7.2 Samples
eles-4947
• Surveys are conducted using samples. Ideally the sample should reveal generalisations about
the population.
• The sample selected to be surveyed should be chosen without bias, as this may result in a sample that is not
representative of the whole population.
918 Jacaranda Maths Quest 10
For example, the students conducting the investigation decide to choose a sample of 12 fellow students.
Although it would be simplest to choose 12 of their friends as the sample, this would introduce bias since
they would not be representative of the population as a whole.
• A random sample is generally accepted as being an ideal representation of the population from which it
was drawn. However, it must be remembered that different random samples from the same population can
produce different results.
This means that we must be cautious about making predictions about a population, as results of surveys
conducted using random samples may vary. √
• A sample size must be sufficiently large. As a general rule, the sample size should be about N, where N is
the size of the population.
• If the sample size is too small, the conclusions that are drawn from the sample data may not reflect the
population as a whole.
A die was rolled 50 times and the following results were obtained.
6, 5, 3, 1, 6, 2, 3, 6, 2, 5, 3, 4, 1, 3, 2, 6, 4, 5, 5, 4, 3, 1, 2, 1, 6,
4, 5, 2, 3, 6, 1, 5, 3, 3, 2, 4, 1, 4, 2, 3, 2, 6, 3, 4, 6, 2, 1, 2, 4, 2
50 ≈ 7.1 .
a. Determine the mean of the population (to 1 decimal place).
( )
√
b. A suitable sample size for this population would be 7
i. Select a random sample of 7 scores, and determine the mean of these scores.
ii. Select a second random sample of 7 scores, and determine the mean of these.
iii. Select a third random sample of 20 scores, and determine the mean of these.
c. Comment on your answers to parts a and b.
∑x
THINK WRITE
= 3.4
50
b. i. Use a calculator to randomly b. i. The 7 scores randomly selected are numbers 17, 50,
generate 7 scores from 1 to 50. 40, 34, 48, 12, 19 in the set of 50 scores.
Relate these numbers back to the These correspond to the scores:
scores, then calculate the mean. 4, 2, 3, 3, 2, 4, 5.
For each of the following situations, state whether the information was obtained by census or survey.
Justify why that particular method was used.
a. A roll call is conducted each morning at school to determine which students are absent.
b. TV ratings are collected from a selection of viewers to discover the popular TV shows.
c. Every hundredth light bulb off an assembly production line is tested to determine the life of
that type of light bulb.
d. A teacher records the examination results of her class.
THINK WRITE
a. Every student is recorded as being present a. This is a census. If the roll call only applied to
or absent at the roll call. a sample of the students, there would not be an
accurate record of attendance at school.
A census is essential in this case.
b. Only a selection of the TV audience b. This is a survey. To collect data from the whole
contributed to these data. viewer population would be time-consuming and
expensive. For this reason, it is appropriate to select
a sample to conduct the survey.
c. Only 1 bulb in every 100 is tested. c. This is a survey. Light bulbs are tested to
destruction (burn-out) to determine their life. If
every bulb was tested in this way, there would be
none left to sell! A survey on a sample is essential.
d. Every student’s result is recorded. d. This is a census. It is essential to record the result
of every student.
Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Individual pathway interactivity: Populations and samples (int-4629)
Sample sizes (int-6183)
Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 4, 5, 8, 11 2, 6, 9, 12 3, 7, 10, 13
Fluency
1. WE17 List some of the problems you might encounter in trying to collect
4. For each of the following, state whether a census or a survey has been used.
a. Two hundred people in a shopping centre are asked to nominate the supermarket where they do most of
their grocery shopping.
b. To find the most popular new car on the road, new car buyers are asked what make and model
they purchased.
c. To find the most popular new car on the road, data are obtained from the transport department.
d. Your Year 10 Maths class completed a series of questions on the amount of maths homework for
Year 10 students.
Understanding
5. To conduct a statistical investigation, Gloria needs to obtain information from 630 students.
a. Determine the appropriate sample size.
b. Describe a method of generating a set of random numbers for this sample.
Reasoning
8. A sampling error is said to occur when results of a sample are different from those of the population from
which the sample was drawn. Discuss some factors which could introduce sampling errors.
9. Since 1961, a census has been conducted in Australia every 5 years. Some people object to the census on the
basis that their privacy is being invaded. Others say that the expense involved could be directed to a better
cause. Others say that a sample could obtain statistics which are just as accurate.
State your views on this. Justify your statements.
10. Australia has a very small population compared with other countries such as China and India. These are the
world’s most populous nations, so the problems we encounter in conducting a census in Australia would be
insignificant compared with those encountered in those countries.
Suggest what different problems authorities would come across when conducting a census in countries with
large populations.
Problem solving
11. The game of Lotto involves picking the same 6 numbers in the range 1 to 45 as have been randomly selected
by a machine containing 45 numbered balls. The balls are mixed thoroughly, then 8 balls are selected
representing the 6 main numbers, plus 2 extra numbers, called supplementary numbers.
Here is a list of the number of times each number had been drawn over a period of time, and also the number
of weeks since each particular number has been drawn.
If these numbers are randomly chosen, explain the differences shown in the tables.
a. Calculate the mean and the median age of the people in this sample.
b. Group the data into class intervals (0–9 etc.) and complete the
frequency distribution table.
c. Use the frequency distribution table to estimate the mean age.
d. Calculate the cumulative frequency and hence plot the ogive.
e. Estimate the median age from the ogive.
f. Compare the mean and median of the original data in part a with the
estimates of the mean and the median obtained for the grouped data in
parts c and e.
g. Were the estimates good enough? Explain your answer.
13. The typing speed (words per minute) was recorded for a group of Year 8 and Year 10 students. The results
are displayed in this back-to-back stem plot.
Key: 2|6 = 26 wpm
Leaf: Year 8 Stem Leaf: Year 10
99 0
9865420 1 79
988642100 2 23689
9776410 3 02455788
86520 4 1258899
5 03578
6 003
Write a report comparing the typing speeds of the two groups.
THINK WRITE
a. No records have been kept on library use. a. Since records are not kept on the library use,
secondary data is not an option.
Therefore, primary data is more appropriate to use
in this case.
State which method would be the most appropriate to collect the following data. Suggest an
alternative method in each case.
a. The number of cars parked in the staff car park each day.
b. The mass of books students carry to school each day.
c. The length a spring stretches when weights are added to it.
d. The cost of mobile phone plans with various network providers.
THINK WRITE
a. Observation a. The best way would probably be observation by visiting the staff car
park to count the number of cars there.
An alternative method would be to conduct a census of all workers to ask
if they parked in the staff car park. This method may be prone to errors
as it relies on accurate reporting by many people.
b. Measurement b. The mass of the books could be measured by weighing each student’s
pack on scales.
A random sample would probably yield a reasonably accurate result.
c. Experiment c. Conduct an experiment and measure the extension of the spring with
various weights.
There are probably no alternatives to this method as results will depend
upon the type of spring used.
d. Internet search d. An internet search would enable data to be collected.
Alternatively, a visit to mobile phone outlets would yield similar results.
The report shows the annual HOUSES Median house price Annual
change in median house prices Suburb/locality 2020–21 2019–20 change
in the local government areas
(LGA) of Queensland from Brisbane (LGA) $700,000 $627,000 11.6%
2019–20 to 2020–21. Ipswich City (LGA) $323,000 $310,000 4.2%
a. Draw a bar graph which would
Redland City (LGA) $467,500 $435,000 7.5%
give the impression that the
percentage annual change was Logan City (LGA) $360,000 $340,000 5.9%
much the same throughout the Moreton Bay (LGA) $399,000 $372,000 7.3%
whole state. Gold Coast City (LGA) $505,000 $465,000 8.6%
b. Construct a bar graph to
Toowoomba (LGA) $360,000 $334,500 7.6%
give the impression that the
percentage annual change Sunshine Coast (LGA) $470,000 $445,000 5.6%
in Brisbane was far greater Fraser Coast (LGA) $322,500 $312,500 3.2%
than that in the other local
Bundaberg (LGA) $282,000 $275,000 2.5%
government areas.
Gladstone (LGA) $286,000 $286,000 0.0%
Rockhampton (LGA) $267,000 $254,000 5.1%
Mackay (LGA) $398,000 $383,000 3.9%
Townsville City (LGA) $375,000 $359,000 4.5%
Cairns (LGA) $400,000 $389,000 2.8%
THINK WRITE/DRAW
a. To flatten out trends, a. % house price changes in QLD 2019–20 to 2020-21
lengthen the horizontal
axis and shorten the
vertical axis.
Annual % change
10
0
Brisbane
Ipswich
Redland
Logan
Moreton Bay
Gold Coast
Toowoomba
Sunshine Coast
Fraser Coast
Bundaberg
Gladstone
Rockhampton
Mackay
Townsville
Cairns
Area
Annual % change
8
7
6
5
4
3
2
0
Brisbane
Ipswich
Redland
Logan
Moreton Bay
Gold Coast
Toowoomba
Sunshine Coast
Fraser Coast
Bundaberg
Gladstone
Rockhampton
Mackay
Townsville
Cairns
Area
Consider the data displayed in the table of Worked example 22. Use the data collected for the median
house prices in 2020–21.
a. Explain whether the data would be classed as primary or secondary data.
b. Explain why the data shows median house prices rather than the mean or modal house price.
c. Calculate a measure of central tendency for the data. Explain the reason for this choice.
d. Give a measure of spread of the data, giving a reason for the particular choice.
e. Display the data in a graphical form, explaining why this particular form was chosen.
THINK WRITE
a. These are data that have been a. These are secondary data because they have been
collected by someone else. collected by someone else.
b. Median is the middle price, mean is b. The median price is the middle value. It is not affected
the average price, and mode is the by outliers as the mean is. The modal house price may
most frequently-occurring price. only occur for two house sales with the same value.
On the other hand, there may not be any mode.
The median price is the most appropriate in this case.
values (i.e. $700 000) and low values (i.e. $282 000).
tendency that is the most appropriate median and mode. The mean is affected by high
one.
These are not typical values, so the mean would not
be appropriate.
There is no modal value, as all the house prices
are different.
The median house price is the most suitable measure
of central tendency to represent the house prices in
The following data is the heights of the members of the Australian women’s national basketball team
(in metres):
1.73, 1.65, 1.8, 1.83, 1.96, 1.88, 1.63, 1.88, 1.83, 1.88, 1.8, 1.96
Provide calculations and explanations as evidence to verify or refute the following statements.
a. The mean height of the team is greater than their median height.
b. The range of the heights of the 12 players is almost 3 times their interquartile range.
c. Only 5 players are on the court at any one time. A team of 5 players can be chosen such that their
mean, median and modal heights are all the same.
∑x
THINK WRITE
a. Mean = = = 1.82 m
21.83
a. 1. Calculate the mean height of the 12 players.
n 12
1.83 + 1.83
median is the average of the 6th and 7th scores.
Median = = 1.83 m
2
3. Comment on the statement. The mean is 1.82 m, while the median is
1.83 m. This means that the mean is less than
1.73 + 1.8
Lower quartile = = 1.765 m
4th scores.
2
The upper quartile is average of the 3rd and
1.88 + 1.88
Upper quartile = = 1.88 m
4th scores from the end.
= = 2.9
Range 0.33
Mean = = 1.88 m
9.4
IQR = Q3 − Q1
frequency as 1. Leave the
This is an excerpt from an article that appeared in a newspaper on Father’s Day. It was reported to be
national survey findings of a Gallup Poll of data from 1255 fathers of children aged 17 and under.
Thinking about all aspects of your life, how Which of these aspects of your children’s future
happy would you say you are? do you have concerns about?
% %
I am very happy......................................................26 Their safety.............................................................70
I am fairly happy.....................................................49 Being exposed to drugs.........................................67
Totally happy..........................................................75 Their health.............................................................54
Some days I’m happy and some days Bullying or cyber-bullying.......................................50
I’m not....................................................................21 Teenage violence....................................................50
I am fairly unhappy...................................................3 Their ability to afford a home..................................50
I am very unhappy....................................................1 Alcohol consumption and binge drinking...............47
Totally unhappy........................................................4 Achieving academic success.................................47
How often, if ever, do you regret having children? Achieving academic success.................................47
Every day..................................................................1 Feeling pressured into sex.....................................41
Most days.................................................................2 Being able to afford the lifestyle they expect to
Some days.............................................................18 have........................................................................38
Never......................................................................79 Climate change......................................................23
Having them living with you in their mid 20s..........14
Which one of these best describes the impact None of the above....................................................3
of having children on your relationship with your
partner? What is the best thing about being a dad?
We’re closer than ever............................................29 The simple pleasures of family life........................61
We don’t spend as much time together as we Enjoying the successes of your kids......................24
should.....................................................................40 The unpredictability it brings....................................9
We’re more like friends now than lovers...............21 The comfort of knowing that you will be looked after
We have drifted apart...............................................6 in later life.................................................................3
None of the above....................................................4 None of the above....................................................3
Which one of these best describes the Key findings
allocation of cooking and cleaning duties in your 75% of Aussie dads are totally happy
household? 79% have never regretted having children
My partner does nothing/I do everything.................1 67% are worried about their children being
I do most of it.........................................................11 exposed to drugs
We share the cooking and cleaning.......................42 57% would like more intimacy with their partner
My partner does most of it.....................................41 “Work–life balance is definitely an issue for dads
I do nothing/my partner does everything.................4 in 2010.”
None of the above....................................................1 David Briggs
Galaxy principal
THINK WRITE
a. How is the sample chosen? Is it truly a. The results of a national survey such as this should
representative of the population of reveal the outlook of the whole nation’s dads. There
Australian dads? is no indication of how the sample was chosen, so
without further knowledge we tend to accept that
it is representative of the population. A sample of
1255 is probably large enough.
This article appeared in a newspaper. Read the article, then answer the following questions.
SPONGES ARE TOXIC
Washing dishes can pose a serious health risk, with more than half of all kitchen sponges containing
high levels of dangerous bacteria, research shows.
A new survey dishing the dirt on washing up shows more than 50 per cent of kitchen sponges have
high levels of E. coli, which can cause severe cramps and diarrhoea, and staphylococcus aureus, which
releases toxins that can lead to food poisoning or toxic shock syndrome.
Microbiologist Craig Andrew-Kabilafkas of Australian Food Microbiology said the Westinghouse study of
more than 1000 households revealed germs can spread easily to freshly washed dishes.
The only way to safeguard homes from sickness was to wash utensils at very high temperatures in a
dishwasher.
THINK WRITE
a. Look at sample size and selection of sample. a. The report claims that the sample size was
more than 1000. There is no indication how
the sample was selected.
The point to keep in mind is whether this
sample is truly representative of the population
consisting of all households. We have no way
of knowing.
Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Individual pathway interactivity: Evaluating inquiry methods and statistical reports (int-4631)
Compare statistical reports (int-2790)
Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 4, 6, 9, 12 2, 7, 10, 13 3, 5, 8, 11, 14
Fluency
1. WE20,21 You have been given an assignment to investigate which Year level has the greatest number of
you have given substantial rises in salary to all your staff. However, profits have not been as spectacular as in
the year before. This table gives the figures for the salary and profits for each quarter.
1st quarter 2nd quarter 3rd quarter 4th quarter
′
Profits ($ 000 000) 6 5.9 6 6.5
Salaries ($′ 000 000) 4 5 6 7
Draw two graphs, one showing profits, the other showing salaries, which will show you in the best possible
light to your shareholders.
3. WE23 The data below were collected from a real estate agent and show the sale prices of ten blocks of land
$150 000, $190 000, $175 000, $150 000, $650 000, $150 000, $165 000, $180 000, $160 000, $180 000
in a new estate.
a. Calculate a measure of central tendency for the data. Explain the reason for this choice.
b. Give a measure of spread of the data, giving a reason for the particular choice.
c. Display the data in a graphical form, explaining why this particular form was chosen.
Own one of these amazing blocks of land for only $150 000 (average)!
d. The real estate agent advertises the new estate land as:
(in metres):
1.73, 1.65, 1.8, 1.83, 1.96, 1.88, 1.63, 1.88, 1.83, 1.88, 1.8, 1.96
Provide calculations and explanations as evidence to verify or refute the following statements.
a. The mean height of the team is closer to the lower quartile than it is to the median.
b. Half the players have a height within the interquartile range.
c. Suggest which 5 players could be chosen to have the minimum range in heights.
5. The resting pulse of 20 female athletes was measured and is shown below.
50 62 48 52 71 61 30 45 42 48 43 47 51 52 34 61 44 54 38 40
a. Represent the data in a distribution table using appropriate groupings.
b. Calculate the mean, median and mode of the data.
c. Comment on the similarities and differences between the three values.
6. The batting scores for two cricket players over six innings were recorded as follows.
Player A: 31, 34, 42, 28, 30, 41
Player B: 0, 0, 1, 0, 250, 0
Player B was hailed as a hero for his score of 250.
Comment on the performance of the two players.
Number
Size sold
4 5
5 7
6 19
7 24
8 16
9 8
10 7
IT’S A RECORD
• Woolworths posted 10.1% gain in annual profit to $2.02b
• 11th consecutive year of double-digit growth
• Flags 8% to 11% growth in the current financial year
• Sales rose 4.8% to $51.2b
• Wants to increase its share of the fresh food market
• Announced $700m off-market share buyback
• Final fully franked dividend 62% a share
10. The graph shows the fluctuation in the Australian dollar in terms of the US AUSSIE
dollar during the period 13 July to 13 September 2010. US¢ US 93.29¢
The higher the Australian dollar, the cheaper it is for Australian companies to 92.8
import goods from overseas, and the cheaper they should be able to sell their
90.9
goods to the Australian public.
The manager of Company XYZ produced a graph to support his claim that, 88.8
because there hasn’t been much change in the Aussie dollar over that period, 86.8
there hasn’t been any change in the price he sells his imported goods to the 84.8
Australian public.
82.8
Draw a graph that would support his claim. Explain how you were able to
achieve this effect. 80.8
Jul 13 Sep 13
Source: IRESS
Source: The Courier Mail, 14
Sept. 2010, p. 25.
11. Two brands of light globes were tested by a consumer organisation. They obtained the following results.
Problem solving
12. A small manufacturing plant employs 80 workers. The table below shows the structure of the plant.
81 000 70 000
in pounds
60 000
50 000
40 000
80 000 30 000
20 000
10 000
Candidate A
70%
Candidate B
63%
Candidate C
60%
12.2 I can calculate the mean, median and mode of data presented as
ungrouped data, frequency distribution tables and grouped data.
12.3 I can calculate the range and interquartile range of a data set.
I can draw a box plot showing the five-number summary of a data set.
12.5 I can calculate the standard deviation of a small data set by hand.
1. Calculate the mean, median, range and IQR scored for each cricketer.
2. You need to recommend the selection of two of the four cricketers. For each player, write two points as
to why you would or would not select them.
Use statistics in your comments.
Bowling averages
The bowling average is the number of runs per wicket taken
Bowling average =
number of runs scored
number of wicket taken
The smaller the average, the better the bowler has performed.
Josh and Ravi were competing for three bowling awards:
• Best in semifinal
• Best in final
• Best overall
Semifinal Final
Runs scored Wickets taken Runs scored Wickets taken
Josh 12 5 28 6
Ravi 10 4 15 3
2. Calculate the bowling averages for the following and fill in the table below.
• Semifinal
• Final
• Overall
3. Explain how Ravi can have the better overall average when Josh has the better average in both the
semifinal and final.
Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Crossword (int-2860)
Sudoku puzzle (int-3599)
Fluency
1. List some problems you might encounter in trying to collect data from the following populations.
a. The average number of mL in a can of soft drink
b. The number of fish in a dam
c. The number of workers who catch public transport to work each weekday morning
2. For each of the following investigations, state whether a census or a survey has been used.
a. The average price of petrol in Canberra was estimated by averaging the price at 30 petrol stations in
the area.
b. The performance of a cricketer is measured by looking at his performance in every match he
has played.
c. Public opinion on an issue is sought by a telephone poll of 2000 homes.
b. Key: 1|2 = 12
a. 7, 15, 8, 8, 20, 14, 8, 10, 12, 6, 19
Stem Leaf
1 26
2 178
3 033468
4 01159
5 136
c.
Score (x) Frequency ( f )
70 2
71 6
72 9
73 7
74 4
c. Key: 1|8 = 18
Stem Leaf
1 7889
2 12445777899
3 0001347
6. For each of the following data sets, calculate the interquartile range.
a. 18, 14, 15, 19, 20, 11 16, 19, 18, 19
4.5 6.2 5.8 4.7 4.0 3.9 6.2 6.8 5.5 6.1
5.9 5.8 5.0 4.3 4.0 4.6 4.8 5.3 4.2 4.8
a. Detail the data on a stem-and-leaf plot. (Use a class size of 0.5 kg.)
b. Prepare a five-point summary of the data.
c. Draw a box plot of the data.
9. Calculate the standard deviation of each of the following data sets correct to 1 decimal place.
a. 58, 12, 98, 45, 60, 34, 42, 71, 90, 66
b.
x 1 2 3 4 5
f 2 6 12 8 5
c. Key: 1|4 = 14
Stem Leaf
0 1344578
1 00012245789
2 022357
10. MC The Millers obtained a number of quotes on the price of having their home painted. The quotes, to
the nearest hundred dollars, were:
4200, 5100, 4700, 4600, 4800, 5000, 4700, 4900
The standard deviation for this set of data, to the nearest whole dollar, is:
A. 260 B. 278 C. 324
D. 325 E. 900
11. MC The number of Year 12 students who, during semester 2, spent all their spare periods studying in
14. The box plot shows the heights (in cm) of Year 12 students in a Maths class.
Problem solving
15. MC A data set has a mean of 75 and a standard deviation of 5. Another score of 50 is added to the data
16. MC A data set has a mean of 60 and a standard deviation of 10. A score of 100 is added to the data set.
This score becomes the highest score in the data set. Choose which of the following will increase.
Note: There may be more than one correct answer.
A. Mean
B. Standard deviation
C. Range
D. Interquartile range
E. Median
a. Calculate the mean and the median age of the people in this sample.
b. Group the data into class intervals of 10 (0–9 etc.) and complete the frequency distribution table.
c. Use the frequency distribution table to estimate the mean age.
d. Calculate the cumulative frequency and hence plot the ogive.
e. Estimate the median age from the ogive.
f. Compare the mean and median of the original data in part a with the estimates of the mean and the
median obtained for the grouped data in parts c and e.
g. Determine if the estimates were good enough. Explain your answer.
18. The table below shows the number of cars that are garaged at each house in a certain street each night.
10
9
8
7
Frequency
6
5
4
3
2
1
0 1 2 3 4 5
Score
20. There are three m values in a data set for which x = m and 𝜎 =
m
.
2
a. Comment on the changes to the mean and standard deviation if each value of the data set is
b. An additional value is added to the original data set, giving a new mean of m + 2. Evaluate the
multiplied by m.
additional value.
21. The following data show the number of pets in each of the 12 houses in Coral Avenue, Rosebud.
2, 3, 3, 2, 2, 3, 2, 4, 3, 1, 1, 0
23. The following back-to-back stem-and-leaf plot shows the ages of a group of 30 males and 30 females as
they enter hospital for the first time.
Key: 1 | 7 = 17
Leaf: Male Stem Leaf: Female
98 0 5
998886321 1 77899
87764320 2 00 1 2 4 5 5 6 7 9
86310 3 013358
752 4 2368
53 5 134
6 2
8 7
a. Construct a pair of parallel box plots to represent the two sets of data, showing working out for the
median and 1st and 3rd quartiles.
b. Calculate the mean, range and IQR for both sets of data.
c. Determine any outliers if they exist.
d. Write a short paragraph comparing the data.
24. The test scores, out of a total score of 50, for two classes A and B are shown in the back-to-back
stem-and-leaf plot.
Key: 1 | 4 = 14
Leaf: Class A Stem Leaf: Class B
5 0 124
9753 1 145
97754 2 005
886551 3 155
320 4 157789
0 5 00
a. Ms Vinculum teaches both classes and made the statement that ‘Class A’s performance on the test
showed that the students’ ability was more closely matched than the students’ ability in Class B’.
By calculating the measure of centre, first and third quartiles, and the measure of spread for the test
scores for each class, explain if Ms Vinculum’s statement was correct.
b. Would it be correct to say that Class A performed better on the test than Class B?
Justify your answer by comparing the quartiles and median for each class.
16, 60, 35, 23, 45, 15, 25, 55, 33, 20, 22, 30, 28, 38, 40, 18, 29, 19, 35, 75
The types of TV advertisements during the 6–8 pm time slot were categorised as Fast food,
Supermarkets, Program information and Retail (clothing, sporting goods, furniture). A frequency
table for the frequency of these advertisements is shown below.
Type Frequency
Fast food 7
Supermarkets 5
Program information 3
Retail 5
d. State the type of data that has been collected in the table.
e. Determine the percentage of advertisements that are advertisements for fast food outlets.
f. Suggest a good option for a graphical representation of this type of data.
Speed Frequency
60–64 1
65–69 1
70–74 10
75–79 13
80–84 9
85–89 8
90–94 6
95–99 3
100–104 2
105–109 1
110–114 1
Total 55
a. By calculating the midpoint for each class interval, determine the mean speed, in km/h, of the cars
travelling along the road.
Write your answer correct to 2 decimal places.
b. The speed limit along the road is 75 km/h. A speed camera is set to photograph the license plates of
cars travelling 7% more than the speed limit.
A speeding fine is automatically sent to the owners of the cars photographed.
c. Drivers of cars travelling 5 km/h up to 15 km/h over the speed limit are fined $135. Drivers of cars
Based on the 55 cars recorded, determine the number of speeding fines that were issued.
travelling more than 15 km/h and up to 25 km/h over the speed limit are fined $165 and drivers of
cars recorded travelling more than 25 km/h and up to 35 km/h are fined $250.
Drivers travelling more than 35 km/h pay a $250 fine in addition to having their driver’s license
suspended.
Assume that this data is representative of the speeding habits of drivers along a major road and there
are 30 000 cars travelling along this road on any given month.
i. Determine the amount, in dollars, collected in fines throughout the month. Write your answer
correct to the nearest cent.
ii. Evaluate the number of drivers that would expect to have their licenses suspended throughout
the month.
To test your understanding and knowledge of this topic, go to your learnON title at
www.jacplus.com.au and complete the post-test.
Cumulative frequency
11.
30
12. D 25
13. B 20
14. The mean typing speed is 26.53 and IQR is 19 for Year 8. 15
The mean typing speed is 40.53 and IQR is 20 for Year 10. 10
This suggests, that the mean typing speed for Year 10 is 5
greater than the Year 8 students. The interquartile range is 0 10 20 30 40 50 60 70 80
not the same for both Year 8 and Year 10. Amount spent ($)
15. E
d. The mean is slightly underestimated; the median is exact.
12.2 Measures of central tendency The estimate is good enough as it provides a guide only
1. a. 7 b. 8 c. 8 to the amount that may be spent by future customers.
15. a. 3
2. a. 6.875 b. 7 c. 4, 7
b. 4, 5, 5, 5, 6 (one possible solution)
3. a. 39.125 b. 44.5 c. No mode
c. One possible solution is to exchange 15 with 20.
4. a. 4.857 b. 4.8 c. 4.8 16. a. Frequency column: 16, 6, 4, 2, 1, 1
15
c. Median
15
10. a. 70 c. 10
16
65− < 70
11. 124.83 5
12.
13. a. B b. B 0
7.5 22.5 37.5 52.5 67.5 82.5
e. 15 − <30
d. 44.1
f. 15 − <30
Cumulative frequency
24 100% 35
20 30
16 25
12 50% 20
8 15
4 10
5
0 30 60 90
Age 0 50 55 60 65 70 75 80
h. 28 Battery life (h)
i. No
Q1 = 58, Q3 = 67
b. i. 62.5
j. Sample responses can be found in the worked solutions in
ii.
the online resources.
iii. 9
8. IQR = 28
v. 6
d. Player A 55
e. Player A is more consistent. One large score can distort 50
Cumulative frequency
the mean. 45
20. Sample responses can be found in the worked solutions in 40
the online resources. 35
30
21. a. Frequency column: 3, 8, 5, 3, 1
25
c. 40− < 50
b. 50.5 20
d. 40− < 50
15
10
e.
5
Ogive of pulse rate
Cumulative frequency (%)
of female athletes
Cumulative frequency
0 120130140150160170180 190200
Range = 23
Class interval
20 100%
b. i. Range = 45
10 50%
c. i. Range = 49
0 30 50 70
ii. IQR = 20
Beats per minute
f. Approximately 48 beats/ min 10. Measures of spread tell us how far apart the values (scores)
22. Answers will vary. Sample responses include: are from one another.
a. 3, 4, 5, 5, 8 11. a. 25.5
b. 4, 4, 5, 7, 10 b. 28
c. 2, 3, 6, 6, 12 c. 39
2a + b
23. 12 d. 6
e. The three lower scores affect the mean but not the median
24.
3 or mode.
mean = 32.3; median = 32.5; range = 38;
IQR = 14
25. 13, 31, 31, 47, 53, 59 12. a. Men:
IQR = 3.4
4. 22 cm
Therefore, the 6th score must be 6. This will maintain Both graphs indicate that the data is slightly negatively
the range, Q1 , Q3 and median. skewed. However, the box plot provides an excellent
summary of the centre and spread of the distribution.
12.4 Box plots
14.
1. a. 5 b. 26
2. a. 6 b. 27 10
3. a. 5.8 b. 18.6 8
Frequency
4. a. 140 b. 55 c. 90 d. 85 e. 26
6
5. a. 58 b. 31 c. 43 d. 27 e. 15
6. B 4
7. C
8. D, E 2
9. a. 22, 28, 35, 43, 48
b. 0 20 40 60 80 100 120 140
Number of passengers on bus journeys
20 30 40 50
Sales
15. a.
Xmin Q1 Median Q3 Xmax
0 10 20 30 40 50
Before 75 86 95 128.5 152
Rainfall (mm)
After 66 81 87 116 134
11. a. (18, 20, 26, 43.5, 74)
b. After
b.
Before
10 30 50 70
Age
60 70 80 90 100 110 120 130 140 150 160
The distribution is positively skewed, with most of the
c.
offenders being young drivers. As a whole, the program was effective. The median
c.
12. a. (124 000, 135 000, 148 000, 157 000, 175 000) weight dropped from 95 kg to 87 kg, a loss of 8 kg. A
noticeable shift in the graph shows that after the program
b.
50% of participants weighed between 66 and 87 kg,
compared to 25% of participants weighing between 75
120 140 160 180 and 86 kg before they started.
Before the program, the range of weights was 77 kg
($ × 1000)
(from 75 kg to 152 kg); after the program, the range
had decreased to 68 kg. The IQR also diminished from
13. a.
42.5 kg to 35 kg.
16. The advantages of box plots is that they are clear visual
representations of 5-number summary, display outliers and
0 1 2 3 4 5 6 7
can handle a large volume of data. The disadvantage is that
individual scores are lost.
Both graphs indicate that the data is slightly positively
skewed. However, the box plot provides an excellent
summary of the centre and spread of the distribution.
to 𝜍 ≈ 10.73.
c. The mean is tripled and the standard deviation is tripled
15 25 35 45
Age
14. The standard deviation will decrease because the average
distance to the mean has decreased.
c. The distribution is positively skewed, with first-time 15. 57 is two standard deviations above the mean.
mothers being under the age of 30. There is one outlier 16. a. New mean is the old mean increased by 3 (15) but no
(48) in this group. change to the standard deviation.
19. f b. New mean is 3 times the old mean (36) and new standard
deviation is 3 times the old standard deviation (12).
17. a. 43 b. 12 c. 12.19
b. i. Under 20 − (20–24): 750 seconds difference Morning: median = 2.45; afternoon: median = 1.6
22. a. 9625 seconds second.
ii. Under 20 − (25–29): 500 seconds difference Morning: range = 3.8; afternoon: range = 5
2. a.
11. Sample responses can be found in the worked solutions in in the online resources.
Mean = 31.83
1. a. When was it first put into the machine? How old was the
battery before being purchased? How frequently has the c.
computer been used on battery?
d.
Cumulative frequency
Median = 30
2. Sample responses can be found in the worked solutions in Age
the online resources. e.
3. a. Census. The airline must have a record of every f. Estimates from parts c and e were fairly accurate.
passenger on every flight.
g. Yes, they were fairly close to the mean and median of the
b. Survey. It would be impossible to interview everyone. raw data.
6.4
it was not surprising that the mode and median were
6.3
contained there also. The mean was slightly higher, and
6.2
this would have been influenced by the one reading in the
6.1
70−79 group.
6.0
5.9 6. Player B appears to be the better player if the mean result is
used. However, Player A is the more consistent player.
0 1 2 3 4 7. a. 7.1 b. 7 c. 7
Quarter
d. The mode has the most meaning as this size sells
the most.
Mean salaries
8. Points that could be mentioned include:
Company salaries 10.1% is only just ‘double digit’ growth.
Mean Salaries
15
10 declining since 2008.
5 The share price has rebounded, but not to its previous high.
0
150 000 300 000 450 000 600 000
median = 1.83 m
4. a. 70 42 35 20 7 30 35 60
Frequency
4 00023 6
5
4* 5678
4
5 03 3
5* 5889 2
6 122 1
6* 8
0 1 2 3 4 5
b. (3.9, 4.4, 4.9, 5.85, 6.8) Number of cars
c. b. Positively skewed — a greater number of scores is
distributed at the lower end of the distribution.
3.5 4.5 5.5 6.5 kg 19. a. Yes b. Yes. Both are 3.
c. 3
x = m2 𝜍= 7m + 2
9. a. 24.4 b. 1.1 c. 7.3
m2
10. A 20. a. b.
b. Range = 25 cm
14.
however has increased because this large value will
c. IQR = 5 cm
change the average of the numbers. The mean is used
as a measure of central tendency if there are no outliers
15. C or if the data are symmetrical. The median is used as a
measure of central tendency if there are outliers or the
a. Mean = 32.03; median = 29.5
16. A, B and C
data are skewed.
17.
22. a.
b. Frequency
44.5 × 1 = 44.5
Class interval Frequency Interval (f) Midpoint × ( f )
54.5 × 1 = 54.5
0–9 2 40–49 1
64.5 × 1 = 64.5
10–19 7 50–59 1
74.5 × 2 = 149
20–29 6 60–69 1
84.5 × 4 = 338
30–39 6 70–79 2
94.5 × 4 = 378
40–49 3 80–89 4
104.5 × 8 = 836
50–59 3 90–99 4
114.5 × 6 = 687
60–69 3 100–109 8
124.5 × 8 = 996
Total 30 110–119 6
144.5 × 2 = 289
c. 130–139 2
154.5 × 0 = 0
d. 140–149 2
Cumulative frequency
30
164.5 × 1 = 164.5
150–159 0
25
20 160–169 1
15 Total 40 4270
10
5
b. 106.75
0 10 20 30 40 50 60 70 80 c. 107.15
Median = 30
Amount spent ($)
d. The differences in this case were minimal; however,
e. the grouped data mean is not based on the actual data
f. Estimates from parts c and e were fairly accurate. but on the frequency in each interval and the interval
g. Yes, they were fairly close to the mean and median of the midpoint. It is unlikely to yield an identical value to the
raw data. actual mean. The spread of the scores within the class
interval has a great effect on the grouped data mean.
Males
x
0 10 20 30 40 50 60 70 80
Age
b.
Males Females
Mean 28.2 31.1
Range 70 57
IQR 18 22
15 20 25 30 35 40 45 50 55 60 65 70 75 t
c. i. 25%
ii. 50%
iii. 75%
d. Categorical
e. 35%
f. Pictogram, pie chart or bar chart.
26. a. 82.73 km/h
c. i. $2 607 272.73
b. 30 cars