0% found this document useful (0 votes)
13 views

c12UnivariateData (1) (1)

This document covers univariate data analysis, including measures of central tendency (mean, median, mode) and measures of spread. It emphasizes the importance of understanding statistics to avoid manipulation and make informed decisions, particularly in the context of real-world applications like government policies and media reporting. The document also includes exercises and examples to help students practice calculating statistical measures.

Uploaded by

748nqf8hkm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

c12UnivariateData (1) (1)

This document covers univariate data analysis, including measures of central tendency (mean, median, mode) and measures of spread. It emphasizes the importance of understanding statistics to avoid manipulation and make informed decisions, particularly in the context of real-world applications like government policies and media reporting. The document also includes exercises and examples to help students practice calculating statistical measures.

Uploaded by

748nqf8hkm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 104

12 Univariate data

LESSON SEQUENCE
12.1 Overview ...............................................................................................................................................................860
12.2 Measures of central tendency ......................................................................................................................865
12.3 Measures of spread ......................................................................................................................................... 879
12.4 Box plots .............................................................................................................................................................. 886
12.5 The standard deviation (Optional) ...............................................................................................................897
12.6 Comparing data sets ....................................................................................................................................... 910
12.7 Populations and samples .............................................................................................................................. 918
12.8 Evaluating inquiry methods and statistical reports .............................................................................. 926
12.9 Review ................................................................................................................................................................... 940
LESSON
12.1 Overview
Why learn this?
According to the novelist Mark Twain, ‘There are three kinds of lies: lies, damned lies
and statistics.’ Statistics can easily be used to manipulate people unless they have an
understanding of the basic concepts involved.
Statistics, when used properly, can be an invaluable aid to good decision-making.
However, deliberate distortion of the data or meaningless pictures can be used to
support almost any claim or point of view. Whenever you read an advertisement, hear
a news report or are given some data by a friend, you need to have a healthy degree
of scepticism about the reliability of the source and nature of the data presented. A
solid understanding of statistics is crucially important, as it is very easy to fall prey to
statistics that are designed to confuse and mislead.
In 2020 when the COVID-19 pandemic hit, news and all forms of media were flooded with statistics. These
statistics were used to inform governments worldwide about infection rates, recovery rates and all sorts of other
important information. These statistics guided the decision-making process in determining the restrictions that
were imposed or relaxed to maintain a safe community.
Statistics are also used to provide more information about a population in order to inform government policies.
For example, the results of a census might indicate that the people in a particular city are fed up with traffic
congestion. With this information now known, the government might prioritise works on public roads, or
increase funding of public transport to try to create a more viable alternative to driving.

Hey students! Bring these pages to life online


Watch Engage with Answer questions
videos interactivities and check solutions

Find all this and MORE in jacPLUS

Reading content Extra learning


and rich media, resources
including
interactivities
and videos for
Differentiated
every concept
question sets

Questions with
immediate
feedback, and
fully worked
solutions to help
students get
unstuck

860 Jacaranda Maths Quest 10


Exercise 12.1 Pre-test

1. The following data show the number of cars in each of the 12 houses along a street.

2, 3, 3, 2, 2, 3, 2, 4, 3, 1, 1, 0

Calculate the median number of cars.

2. Calculate the range of the following data set: 5, 15, 23, 6, 31, 24, 26, 14, 12, 34, 18, 9, 17, 32.

3. The frequency table shows the scores obtained by 102 professional golfers in the final round of
a tournament.

Score Frequency
67 2
68 6
69 7
70 11
71 16
72 23
73 17
74 11
75 9

Identify the modal score.

4. A sample of 15 people was selected at random from those attending a local swimming pool. Their ages
(in years) were recorded as follows.

19, 7, 83, 41, 17, 23, 62, 55, 15, 25, 32, 29, 11, 18, 10

Calculate the mean age of people attending the swimming pool, correct to 1 decimal place.

5. Complete the following sentence.


A sample is a _________ of a population.

6. At Einstein Secondary School a Year 10 mathematics class has 22 students. The following were the test
scores for the class.

34, 47, 54, 59, 60, 63, 66, 69, 73, 77, 78

78, 79, 80, 82, 83, 85, 86, 88, 89, 90, 91

Calculate the interquartile range (IQR).

7. The mean of a set of five scores is 11.8. If four of the scores are 17, 9, 14 and 6, calculate the fifth score.

TOPIC 12 Univariate data 861


8. The box plot below shows the price in dollars of a meal for one person from ten fast-food shops.

6 7 8 9 10 11 12 13 14

State whether the data is negatively skewed, positively skewed or symmetrical.

9. A frequency table for the time taken by 20 people to put together an item of flat-pack furniture
is shown.

Time taken (min) Frequency


0–4 1
5–9 3
10–14 5
15–19 2
20–24 4
25–29 2
30–34 2
35–39 1

Calculate the cumulative frequency to put together an item of flat-pack furniture in less than
20 minutes.

10. MC The frequency table below shows the scores obtained by 100 professional golfers in the final round
of a tournament.

Score Frequency
67 2
68 5
69 8
70 11
71 16
72 22
73 14
74 13
75 9

Select the median score.


A. 71 B. 71.75 C. 72 D. 72.5 E. 73

11. The heights of six basketball players (in cm) are:

178.1 185.6 173.3 193.4 183.1 193.0

Calculate the mean and standard deviation.

862 Jacaranda Maths Quest 10


12. MC A group of 22 people recorded how many cans of soft drink they drank in a day. The table shows

the number of cans drunk by each person.

0 2 2 2 1 1 3 4 4 2 1
2 4 1 6 3 3 5 4 1 2 5

Select the statement that is not true.


A. The maximum number of soft drinks cans drank is 6.
B. The minimum number of soft drink cans drank is 0.
C. The interquartile range is 3.
D. The median number of soft drink cans is 2.5.
E. The mean number of soft drink cans drank is 2.64.

13. MC Select the approximate median in the cumulative frequency percentage graph shown.

100
90
Cumulative frequency

80
70
60
50
40
30
20
10
0
10 20 30 40 50 60 70
Mass (g)

A. 30 B. 32 C. 40 D. 50 E. 92

14. MC The following back-to-back stem-and-leaf plot shows the typing speed in words per minute (wpm)

of 30 Year 8 and Year 10 students.

Key: 2|6 = 26 wpm


Leaf: Year 8 Stem Leaf: Year 10
922 0
986542 1 49
888642100 2 23689
9776410 3 03455788
76520 4 1258899
5 03578
6 001
Calculate the mean typing speed and interquartile range for Year 8 and Year 10. Comment on
your answers.

TOPIC 12 Univariate data 863


15. MC A survey was conducted on the favourite take-away foods for university students and the results

were graphed using two bar charts.


Graph 1
Favourite take-away foods

990

985

980

Number of votes
975

970

965

960

955

950
Hamburgers Pizza Chicken
wings
Favourite food

Graph 2
Favourite take-away foods

1200

1000
Number of votes

800

600

400

200

0
Hamburgers Pizza Chicken
wings
Favourite food

Select the statement that best describes these graphs.


A. Graph 1 is misleading as it suggests that students like hamburgers 7 times more than
chicken wings.
B. Graph 1 is misleading as it suggests that students like pizza 3 times more than hamburgers.
C. Graph 1 is misleading as it suggests that students like pizza four times more than chicken wings.
D. Graph 2 is misleading as it suggests that students like all take-away foods evenly.
E. Graph 1 is misleading as the vertical axis does not start at zero.

864 Jacaranda Maths Quest 10


LESSON
12.2 Measures of central tendency
LEARNING INTENTION
At the end of this lesson you should be able to:
• calculate the mean, median and mode of data presented as ungrouped data, frequency distribution tables
and grouped data.

12.2.1 Mean, median and mode of univariate data


eles-4949
• Univariate data are data with one variable, for example the heights of Year 10 students.
• Measures of central tendency are summary statistics that measure the centre of the data. These are known
as the mean, median and mode.
• The mean is the average of all observations in a set of data.
• The median is the middle observation in an ordered set of data.
• The mode is the most frequent observation in a data set.

The mean
• The mean of a set of data is what is referred to in everyday language as the average.
• The mean of a set of values is the sum of all the values divided by the number of values.
• The symbol we use to represent the mean is x; that is, a lower-case x with a bar on top.

Calculating the mean


Mean =
sum of data values
total number of data values

Using mathematical notation, this is written as:

x= ∑
x
n

The median
• The median represents the middle score when the data values are in ascending order such that an equal
number of data values will lie below the median and above it.

Calculating the median


When calculating the median, use the following steps:

n+1
1. Arrange the data values in order.
( )
2. The position of the median is the th data value, where n is the total number of data values.
2
Note: If there are an even number of data values, there will be two middle values. In this case the
median is the average of those data values.

TOPIC 12 Univariate data 865


• When there are an odd number of data values, the median is the middle value.

1 1 3 4 6 7 8

median = 4

• When there are an even number of data values, the median is the average of the two middle values.

2 3 3 5 6 6 7 9

5+6
median = = 5.5
2

The mode
• The mode is the observation that occurs most often.
• The data set can have no modes, one mode, two modes (bimodal) or more than two modes (multimodal).

Identifying the mode


When identifying the mode, look for the number that occurs most often (has the highest frequency).

• If no value in a data set appears more than once then there is no mode.
• If a data set has multiple values that appear the most then it has multiple modes. All values that appear the
most are modes.
For example, the set 1, 2, 2, 4, 5, 5, 7 has two modes, 2 and 5.

WORKED EXAMPLE 1 Calculating mean, median and mode

For the data 6, 2, 4, 3, 4, 5, 4, 5, calculate:


a. the mean b. the median c. the mode.

a. 1. Calculate the sum of the scores; that is, ∑ x. a. ∑ x = 6 + 2 + 4 + 3 + 4 + 5 + 4 + 5


THINK WRITE

= 33
n=8
∑x
2. Count the number of scores; that is, n.

3. Write the rule for the mean. x=


n
=
33
4. Substitute the known values into the rule.

= 4.125
8
5. Evaluate.
6. Write the answer. The mean is 4.125.

866 Jacaranda Maths Quest 10


n+1
b. 1. Write the scores in ascending numerical order. b. 2, 3, 4, 4, 4, 5, 5, 6

Median =
n+1
, where n = 8. This places the
2. Locate the position of the median using th score
8+1
2
=
the rule
2 th score

= 4.5 th score
median as the 4.5th score; that is, between the 2
4th and 5th score.

2, 3, 4, 4, 4, 5, 5, 6
4+4
3. Obtain the average of the two middle scores. Median =
2
=
8

=4
2

↓ ↓
4. Write the answer The median is 4.
c. 1. Systematically work through the set and make c.

↑ ↑ ↑
note of any repeated values (scores). Identify 2 3 4 4 4 5 5 6
the most frequently occurring observation.
2. Write the answer. The mode is 4.

TI | THINK DISPLAY/WRITE CASIO | THINK DISPLAY/WRITE


1. In a new document, on 1. On the Statistics screen,
a Lists & Spreadsheet label list1 as ‘x’ and enter
page, label column A as the values as shown in the
‘xvalues’, and enter the table.
values in the data set. Press EXE after entering
Press ENTER after each value.
entering each value.

2. Although you can find 2. To calculate the mean,


many summary statistics, median and mode, tap:
to find the mean only, • Calc
open a Calculator page • One-Variable
and press:
Set values as:
• MENU
• XList: main\x
• 6: Statistics
• Freq: 1
• 3: List Math
• 3: Mean Tap OK. ( )
The mean is at the top x .
Press VAR and select The mean is 4.125 and the
Scroll down to find the
‘xvalues’, then press median is 4.
median and mode.
ENTER. To calculate the The mode is 4.
median only, press: The mean is 4.125 and
the median is 4.
• MENU
The mode is 4.
• 6: Statistics
• 3: List Math
• 4: Median
Press VAR and select
‘xvalues’, then press
ENTER.

TOPIC 12 Univariate data 867


12.2.2 Calculating mean, median and mode from a frequency
eles-4950
distribution table
• If data is provided in the form of a frequency distribution table we can determine the mean, median and
mode using slightly different methods.
• The mode is the score with the highest frequency.

n+1
• To calculate the median, add a cumulative frequency column to the table and use it to determine the score
( )
that is the th data value.
2
• To calculate the mean, add a column that is the score multiplied by its frequency f × x. The following
formula can then be used to calculate the mean, where ∑ ( f × x) is the sum of the ( f × x) column. ∑ is
the uppercase Greek letter sigma.

Calculating the mean from a


frequency table
( f × x)
x= ∑
n

WORKED EXAMPLE 2 Calculations from a frequency distribution table

Using the frequency distribution table, calculate:


a. the mean b. the median c. the mode.

Score (x) Frequency ( f)


4 1
5 2
6 5
7 4
8 3
Total 15

THINK WRITE
1. Rule up a table with four columns

× score
Frequency Cumulative
Frequency × score ( f × x) and
titled Score (x), Frequency ( f ),
( f × x)
frequency
Score (x) Frequency ( f) (cf)
Cumulative frequency (cf ).

f × x and cumulative frequency columns. 1+2=3


2. Enter the data and complete both the 4 1 4 1

3+5=8
5 2 10

8 + 4 = 12
6 5 30

12 + 3 = 15
7 4 28

n = 15 ∑( f × x) = 96
8 3 24

868 Jacaranda Maths Quest 10


∑ ( f × x)
a. 1. Write the rule for the mean. a. x =
n

x=
96
2. Substitute the known values into the

= 6.4
rule and evaluate. 15

3. Write the answer. The mean of the data set is 6.4.

15 + 1
( )

n+1
b. 1. Locate the position of the median b. The median is the th or 8th score.
, where n = 15.
2
using the rule
2
This places the median as the 8th
score.
2. Use the cumulative frequency column The median of the data set is 6.
to find the 8th score and write the
answer.
c. 1. The mode is the score with the c. The score with the highest frequency is 6.
highest frequency.
2. Write the answer. The mode of the data set is 6.

TI | THINK DISPLAY/WRITE CASIO | THINK DISPLAY/WRITE


1. In a new problem, on a 1. On the Statistics screen,
Lists & Spreadsheet page, label list1 as ‘score’
label column A as ‘score’ and list2 as ‘f’, and enter
and column B as ‘f’. the values as shown in the
Enter the values as shown table.
in the table and press Press EXE after entering
ENTER after entering each value.
each value.

2. To find the summary 2. To find the summary


statistics, press: statistics, tap:
• MENU • Calc
• 4: Statistics • One-Variable
• 1: Stat Calculations
• 1: One-Variable Set values as:
Statistics... • XList: main\score
Select 1 as the number • Freq: main\f
of lists. Then on the
One-Variable Statistics
page select ‘score’ as
the X1 List and ‘f’ as the
Frequency List. Leave the
next fields empty, TAB to
OK and press ENTER.

TOPIC 12 Univariate data 869


The mean x = 6.4 and the
3. The results are displayed. 3. Tap OK.
The mean is at the top x.
median is 6. Scroll down to find the
The mode is the data set median and mode.
with the highest frequency
value, which in this case
is 6.

The mean is x = 6.4 and the


median is 6. The mode is the data

The mean is x = 6.4 and


set with the highest frequency
value, which is 6.
the median is 6. The mode
is the data set with the
highest frequency value,
which is 6.

Mean, median and mode of grouped data


• When the data are grouped into class intervals, the actual values (or data) are lost. In such cases we have to
approximate the real values with the midpoints of the intervals into which these values fall.
For example, if in a grouped frequency table showing the heights of different students, 4 students had a
height between 180 and 185 cm, we assume that each of those 4 students is 182.5 cm tall.

Mean
• The formula for calculating the mean is the same as the formula used when the data is displayed in a
frequency distribution table:
∑( f × x)
x=
n
Here, x represents the midpoint (or class centre) of each class interval, f is the corresponding frequency and
n is the total number of observations in a set.

Median
• The median is found by drawing a cumulative frequency curve (ogive) of the data and estimating the
median from the 50th percentile (see section 12.2.3).

Modal class
• The modal class is the class interval that has the highest frequency.

12.2.3 Cumulative frequency curves (ogives)


eles-4951
Ogives
• Data from a cumulative frequency table can be plotted to form a cumulative frequency curve (sometimes
referred to as cumulative frequency polygon), which is also called an ogive (pronounced ‘oh-jive’).
• To plot an ogive for data that is in class intervals, the maximum value for the class interval is used as the
value against which the cumulative frequency is plotted.

870 Jacaranda Maths Quest 10


Quantiles
• An ogive can be used to divide the data into any given number of equal parts called quantiles.
• Quantiles are named after the number of parts that the data are divided into.
• Percentiles divide the data into 100 equal-sized parts.
• Quartiles divide the data into 4 equal-sized parts. For example, 25% of the data values lie at or below the
first quartile.

Percentile Quartile and symbol Common name


25th percentile First quartile, Q1 Lower quartile
50th percentile Second quartile, Q2 Median
75th percentile Third quartile, Q3 Upper quartile
100th percentile Fourth quartile, Q4 Maximum

• A percentile is named after the percentage of data that lies at or below that value.
For example, 60% of the data values lie at or below the 60th percentile.
• Percentiles can be read off a percentage cumulative frequency curve.
• A percentage cumulative frequency curve is created by:
• writing the cumulative frequencies as a percentage of the total number of data values
• plotting the percentage cumulative frequencies against the maximum value for each interval.
• For example, the following table and graph show the mass of cartons of eggs ranging from 55 g to 65 g.

Percentage cumulative

55− < 57
Mass (g) Frequency ( f) Cumulative frequency (cf) frequency (%cf)

57− < 59 2+6=8


2 2 6%

59− < 61 8 + 12 = 20
6 22%

61− < 63 20 + 11 = 31
12 56%

63− < 65 31 + 5 = 36
11 86%
5 100%
Percentage cumulative frequency

100
90
80
70
60
50
40
30
20
10
0 55 56 57 58 59 60 61 62 63 64 65 66
Mass (g)

TOPIC 12 Univariate data 871


WORKED EXAMPLE 3 Estimating mean, median and modal class in grouped data

For the given data:

60 − < 70
Class interval Frequency

70 − < 80
5

80 − < 90
7

90 − < 100
10

100 − < 110


12

110 − < 120


8
3
Total 45

a. estimate the mean b. estimate the median c. determine the modal class.

THINK WRITE

Frequency ×
1. Draw up a table with 5
columns headed Class Cumulative

( f × x)
interval, Class centre (x), Class Class class centre frequency

× class centre ( f × x) and


Frequency ( f ), Frequency interval centre (x) Freq. ( f) (cf )

60− < 70
70− < 80
65 5 325 5
2. Complete the x, f, f × x
Cumulative frequency (cf ).

80− < 90
75 7 525 12

90− < 100


and cf columns. 85 10 850 22

100− < 110


95 12 1140 34

110− < 120


105 8 840 42

n = 45 ∑( f × x) = 4025
115 3 345 45

∑( f × x)
a. 1. Write the rule for the a. x =
mean. n

x=
4025
2. Substitute the known

≃ 89.4
values into the rule and 45
evaluate.
3. Write the answer. The mean for the given data is approximately 89.4.
b. 1. Draw a combined b.
cumulative frequency 45
histogram and ogive, 40
Cumulative frequency

labelling class centres 35


on the horizontal axis and 30
cumulative frequency on
25
the vertical axis.
20
Join the end points of
each class interval with a 15
straight line to form the 10
ogive. 5

0
65 75 85 95 105 115
Data

872 Jacaranda Maths Quest 10


2. Locate the middle of the
cumulative frequency 45
axis, which is 22.5.
40
3. Draw a horizontal line

Cumulative frequency
35
from this point to the
30
ogive and a vertical line
to the horizontal axis. 25
20
15
10
5

0
65 75 85 95 105 115
Data
4. Read off the value of the The median for the given data is approximately 90.
median from the x-axis

c. The class internal 90– < 100 occurs twelve times, which is the highest
and write the answer.
c. 1. The modal class is the
class interval with the frequency.

The modal class is the 90– < 100 class interval.


highest frequency.
2. Write the answer.

Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Video eLesson Mean and median (eles-1905)
Interactivities Individual pathway interactivity: Measures of central tendency (int-4621)
Mean (int-3818)
Median (int-3819)
Mode (int-3820)
Ogives (int-6174)

TOPIC 12 Univariate data 873


Exercise 12.2 Measures of central tendency
12.2 Quick quiz 12.2 Exercise

Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 2, 7, 10, 14, 17, 18, 23 3, 4, 6, 8, 11, 15, 19, 20, 24 5, 9, 12, 13, 16, 21, 22, 25

Fluency
WE1 For questions 1 to 5, calculate:

a. the mean b. the median c. the mode.


1. 3, 5, 6, 8, 8, 9, 10

2. 4, 6, 7, 4, 8, 9, 7, 10

3. 17, 15, 48, 23, 41, 56, 61, 52

4. 4.5, 4.7, 4.8, 4.8, 4.9, 5.0, 5.3

5. 7 , 10 , 12, 12 , 13, 13 , 13 , 14
1 1 1 1 1
2 4 4 2 2
6. The back-to-back stem-and-leaf plot shows the test results of
25 Year 10 students in Mathematics and Science. Calculate the
mean, median and mode for each of the two subjects.
Key: 3 ∣ 2 = 32
Leaf: Science Stem Leaf: Mathematics
873 3 29
96221 4 068
876110 5 135
97432 6 2679
8510 7 3678
73 8 044689
9 258

7. WE2 Use the frequency distribution table shown to calculate:


a. the mean b. the median c. the mode.

Score (x) Frequency ( f)


4 3
5 6
6 9
7 4
8 2
Total 24

874 Jacaranda Maths Quest 10


8. Use the frequency distribution table shown to calculate:
a. the mean b. the median c. the mode.

Score (x) Frequency ( f)


12 4
13 5
14 10
15 12
16 9
Total 40
9. The following data show the number of bedrooms in each of the 10 houses in a particular neighbourhood:
2, 1, 3, 4, 2, 3, 2, 2, 3, 3.
a. Calculate the mean and median number of bedrooms.
b. A local motel contains 20 rooms. Add this observation to the set of data and recalculate the values of the
mean and median.
c. Compare the answers obtained in parts a and b and complete the following statement:
When the set of data contains an unusually large value(s), called an outlier, the ________ (mean/median)
is the better measure of central tendency, as it is less affected by this extreme value.
10. WE3 For the given data:
a. estimate the mean b. estimate the median c. determine the modal class.

40 − < 50
Class interval Frequency

50 − < 60
4

60 − < 70
4

70 − < 80
6

80 − < 90
9

90 − < 100
5
4
Total 32

11. Calculate the mean of the grouped data shown in the table below.

Class interval Frequency


100–109 3
110–119 7
120–129 10
130–139 6
140–149 4
Total 30
12. Determine the modal class of the data shown in the table below.

50 – < 55
Class interval Frequency

55 − < 60
1

60 − < 65
3

65 − < 70
4

70 − < 75
5

75 − < 80
3
2
Total 18

TOPIC 12 Univariate data 875


13. The number of textbooks sold by various bookshops during the second week of December was recorded.
The results are summarised in the table.

Number of
books sold Frequency
220–229 2
230–239 2
240–249 3
250–259 5
260–269 4
270–279 4
Total 20

a. MC The modal class of the data is given by the class interval(s):


A. 220–229 and 230–239 B. 250–259 C. 260–269 and 270–279
D. of both A and C E. none of these
b. MC The class centre of the first class interval is:
A. 224 B. 224.5 C. 224.75
D. 225 E. 227
c. MC The median of the data is in the interval:
A. 230–239 B. 240–249 C. 250–259
D. 260–269 E. 270–279
d. MC The estimated mean of the data is:
A. 251 B. 252 C. 253
D. 254 E. 255

Understanding
14. A random sample was taken, composed of 30 people shopping at a supermarket on a Tuesday night. The
amount of money (to the nearest dollar) spent by each person was recorded as follows.
6 32 66 17 45 1 19 52 36 23 28 20 7 47 39
6 68 28 54 9 10 58 40 12 25 49 74 63 41 13
a. Calculate the mean and median amount of money spent at the checkout by the people in this sample.
b. Group the data into class intervals of 10 and complete the frequency distribution table. Use this table to
estimate the mean amount of money spent.
c. Add the cumulative frequency column to your table and fill it in. Hence, construct the ogive.
Use the ogive to estimate the median.
d. Compare the mean and the median of the original data from part a with the mean and the median obtained
for grouped data in parts b and c.
Explain if the estimates obtained in parts b and c were good enough.
15. Answer the following question and show your working.
a. Add one more number to the set of data 3, 4, 4, 6 so that the mean of a new set is equal to its median.
b. Design a set of five numbers so that mean = median = mode = 5.
c. In the set of numbers 2, 5, 8, 10, 15, change one number so that the median remains unchanged while the
mean increases by 1.

876 Jacaranda Maths Quest 10


16. Thirty men were asked to reveal the number of hours they spent doing housework each week. The results are
detailed below.
1, 5, 2, 12, 2, 6, 2, 8, 14, 18,
0, 1, 1, 8, 20, 25, 3, 0, 1, 2,
7, 10, 12, 1, 5, 1, 18, 0, 2, 2
a. Present the data in a frequency distribution table. (Use class intervals of 0–4, 5–9 etc.)
b. Use your table to estimate the mean number of hours that the men spent doing housework.
c. Determine the median class for hours spent by the men at housework.
d. Identify the modal class for hours spent by the men at housework.

Reasoning
17. The data shown give the age of 25 patients admitted to the emergency ward of a hospital.

18, 16, 6, 75, 24,


23, 82, 75, 25, 21,
43, 19, 84, 76, 31,
78, 24, 20, 63, 79,
80, 20, 23, 17, 19

a. Present the data in a frequency distribution table. (Use class intervals of 0 − < 15, 15 − < 30 and so on.)
b. Draw a histogram of the data.
c. Suggest a word to describe the pattern of the data in this distribution.
d. Use your table to estimate the mean age of patients admitted.
e. Determine the median class for age of patients admitted.
f. Identify the modal class for age of patients admitted.
g. Draw an ogive of the data.
h. Use the ogive to determine the median age.
i. Explain if any of your statistics (mean, median or mode) give a clear representation of the typical age of an
emergency ward patient.
j. Give some reasons which could explain the pattern of the distribution of data in this question.

18. MC In a set of data there is one score that is extremely small when compared to all the others.

This outlying value is most likely to:


A. have greatest effect upon the mean of the data
B. have greatest effect upon the median of the data
C. have greatest effect upon the mode of the data
D. have very little effect on any of the statistics as we are told that the number is extremely small
E. none of these
19. The batting scores for two cricket players over 6 innings are as follows.

Player A 31, 34, 42, 28, 30, 41


Player B 0, 0, 1, 0, 250, 0
a. Calculate the mean score for each player.
b. State which player appears to be better, based upon mean result. Justify your answer.
c. Determine the median score for each player.
d. State which player appears to be better when the decision is based on the median result. Justify your
answer.
e. State which player do you think would be the most useful to have in a cricket team. Justify your answer.
Explain how can the mean result sometimes lead to a misleading conclusion.

TOPIC 12 Univariate data 877


20. The following frequency table gives the number of employees in different
salary brackets for a small manufacturing plant.

Number of
Position Salary ($) employees
Machine operator 18 000 50
Machine mechanic 20 000 15
Floor steward 24 000 10
Manager 62 000 4
Chief executive officer 80 000 1

a. Workers are arguing for a pay rise but the management of the factory

factory is $22 100.


claims that workers are well paid because the mean salary of the

Explain whether the management is being honest.


b. Suppose that you were representing the factory workers and had to
write a short submission in support of the pay rise.
How could you explain the management’s claim? Quote some other
statistics in favour of your case.

21. The resting pulse rate of 20 female athletes was measured. The results are detailed below.

50, 52, 48, 52, 71, 61, 30, 45, 42, 48,
43, 47, 51, 62, 34, 61, 44, 54, 38, 40

a. Construct a frequency distribution table. (Use class sizes of 1 –< 10, 10 –< 20 and so on.)
b. Use your table to estimate the mean of the data.
c. Determine the median class of the data.
d. Identify the modal class of the data.
e. Draw an ogive of the data. (You may like to use a graphics calculator for this.)
f. Use the ogive to determine the median pulse rate.

a. mean = median = mode


22. Design a set of five numbers with:

b. mean > median > mode


c. mean < median = mode.

Problem solving
23. The numbers 15, a, 17, b, 22, c, 10 and d have a mean of 14. Calculate the mean of a, b, c and d.

24. The numbers m, n, p, q, r, and s have a mean of a while x, y and z have a mean of b. Calculate the mean of
all nine numbers.
25. The mean and median of six two-digit prime numbers is 39 and the mode is 31. The smallest number is 13.
Determine the six numbers.

878 Jacaranda Maths Quest 10


LESSON
12.3 Measures of spread
LEARNING INTENTION
At the end of this lesson you should be able to:
• calculate the range and interquartile range of a data set.

12.3.1 Measures of spread


eles-4952
• Measures of spread describe how far data values are spread from the centre or from each other.
• A shoe store proprietor has stores in Newcastle and Wollongong. The number of pairs of shoes sold each
day over one week is recorded below.

Newcastle: 45, 60, 50, 55, 48, 40, 52


Wollongong: 20, 85, 50, 15, 30, 60, 90

In each of these data sets consider the measures of central tendency.

Mean = 50 Mean = 50
Median = 50 Median = 50
Newcastle: Wollongong:

No mode No mode

With these measures being the same for both data sets we could come to the conclusion that both data sets
are very similar; however, if we look at the data sets, they are very different. We can see that the data for
Newcastle are very clustered around the mean, whereas the Wollongong data are spread out more.
The data from Newcastle are between 40 and 60, whereas the Wollongong data are between 15 and 90.
• Range and interquartile range (IQR) are both measures of spread.

Range
• The most basic measure of spread is the range.
• The range is defined as the difference between the highest and the lowest values in the set of data.

Range = highest score − lowest score


Calculating the range of a data set

= Xmax − Xmin

WORKED EXAMPLE 4 Calculating the range of a data set

Calculate the range of the given data set: 2.1, 3.5, 3.9, 4.0, 4.7, 4.8, 5.2.

Lowest score = 2.1


THINK WRITE

Highest score = 5.2


1. Identify the lowest score (Xmin ) of the data set.

Range = Xmax − Xmin


2. Identify the highest score (Xmax ) of the data set.

= 5.2 − 2.1
3. Write the rule for the range.

= 3.1
4. Substitute the known values into the rule.
5. Evaluate and write the answer.

TOPIC 12 Univariate data 879


Interquartile range
• The interquartile range (IQR) is the spread of the middle 50% of all the scores in an ordered set. When
calculating the interquartile range, the data are first organised into quartiles, each containing 25% of
the data.
• The word ‘quartile’ comes from the word ‘quarter’.

Q2
Minimum Median Maximum

25% 25% 25% 25%

Q1 Q3
Lower quartile Upper quartile

• The lower quartile (Q1 ) is the median of the lower half of the data set.
• The upper quartile (Q3 ) is the median of the upper half of the data set.

Calculating the IQR


Interquartile range (IQR) = upper quartile − lower quartile
= Qupper − Qlower
= Q3 − Q1

• The IQR is not affected by extremely large or extremely small data values (outliers), so in some
circumstances the IQR is a better indicator of the spread of data than the range.

WORKED EXAMPLE 5 Calculating the IQR of a data set

Calculate the interquartile range (IQR) of the following set of data:

3, 2, 8, 6, 1, 5, 3, 7, 6.

THINK WRITE

1, 2, 3, 3, 5 , 6, 6, 7, 8
1. Arrange the scores in order. 1, 2, 3, 3, 5, 6, 6, 7, 8
2. Locate the median and use it to divide the data
into two halves.
Note: The median is the 5th score in this data
set and should not be included in the lower or

2+3
upper ends of the data.

3. Calculate Q1 , the median of the lower half of Q1 =


2
=
the data.
5

= 2.5
2

880 Jacaranda Maths Quest 10


6+7
4. Calculate Q3 , the median of the upper half of Q3 =
2
=
the data.
13

= 6.5
2

IQR = Q3 − Q1
= 6.5 − 2.5
5. Calculate the interquartile range.

6. Write the answer. =4

TI | THINK DISPLAY/WRITE CASIO | THINK DISPLAY/WRITE


1. In a new problem, on 1. On the Statistics screen,
a Lists & Spreadsheet label list1 as ‘x’ and enter the
page, label column A as values as shown in the table.
‘xvalues’. Enter the Press EXE after entering each
values from the data set. value.
Press ENTER after
entering each value.

2. To find the summary 2. To find the summary statistics,


statistics, open a tap:
Calculator page and • Calc
press: • One-Variable
• MENU Set values as:
• 6: Statistics • XList: main\x
• 1: Stat Calculations • Freq: 1

Calculate the IQR = Q3 − Q1


• 1: One-Variable Tap OK.

The IQR = Q3 − Q1 = 6.5 − 2.5 = 4


Statistics
Select 1 as the number
of lists. Then on the
One-Variable Statistics

The IQR = Q3 − Q1
page select ‘xvalues’ as

= 6.5 − 2.5 = 4
the X1 List and leave
the Frequency as 1.
Leave the remaining
fields empty, TAB to
OK and press ENTER.
The summary statistics
are shown.

TOPIC 12 Univariate data 881


Determining the IQR from a graph
• When data are presented in a frequency distribution table, either ungrouped or grouped, the interquartile
range is found by drawing an ogive.

WORKED EXAMPLE 6 Calculating the IQR from a graph

The following frequency distribution table gives the number of customers who order different
volumes of concrete from a readymix concrete company during the course of a day.
Calculate the interquartile range of the data.

0.0 − < 0.5 1.5 − < 2.0


Volume (m3 ) Frequency Volume (m3 ) Frequency

0.5 − < 1.0 2.0 − < 2.5


15 8

1.0 − < 1.5 2.5 − < 3.0


12 2
10 4

THINK WRITE/DRAW
1. To calculate the 25th and 75th percentiles from

0.0 − < 0.5


Volume Class centre f cf
the ogive, first add a class centre column and a

0.5 − < 1.0


cumulative frequency column to the frequency 0.25 15 15

1.0 − < 1.5


distribution table and fill them in. 0.75 12 27

1.5 − < 2.0


1.25 10 37

2.0 − < 2.5


1.75 8 45

2.5 − < 3.0


2.25 2 47
2.75 4 51

2. Draw the ogive. A percentage axis will


Cumulative frequency

Cumulative frequency (%)


be useful. 50 100%
40 75%
30
50%
20
25%
10

0 5 5 5 5 5 5
0.2 0.7 1.2 1.7 2.2 2.7
Volume (m3)

Q3 = 1.6 m3
Q1 = 0.4 m3
3. Identify the upper quartile (75th percentile)
and lower quartile (25th percentile) from

IQR = Q3 − Q1
the ogive.

= 1.6 − 0.4
4. The interquartile range is the difference

= 1.2 m3
between the upper and lower quartiles.

Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Individual pathway interactivity: Measures of spread (int-4622)
Range (int-3822)
The interquartile range (int-4813)

882 Jacaranda Maths Quest 10


Exercise 12.3 Measures of spread
12.3 Quick quiz 12.3 Exercise

Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 4, 7, 10, 13 2, 6, 8, 11, 14 3, 5, 9, 12, 15

Fluency
1. WE4 Calculate the range for each of the following sets of data.
a. 4, 3, 9, 12, 8, 17, 2, 16
b. 49.5, 13.7, 12.3, 36.5, 89.4, 27.8, 53.4, 66.8
1 3 1 2 1 3
c. 7 , 12 , 5 , 8 , 9 , 3
2 4 4 3 6 4
2. WE5 Calculate the interquartile range (IQR) for the following sets of data.
a. 3, 5, 8, 9, 12, 14
b. 7, 10, 11, 14, 17, 23
c. 66, 68, 68, 70, 71, 74, 79, 80
d. 19, 25, 72, 44, 68, 24, 51, 59, 36
3. The following stem-and-leaf plot shows the mass of newborn babies (rounded to the nearest 100g).
Calculate:
a. the range of the data b. the IQR of the data.

Key: 1∗ ∣ 9 = 1.9 kg
Stem Leaf
1* 9
2 24
2* 6 7 8 9
3 001234
3* 5 5 6 7 8 8 8 9
4 01344
4* 5 6 6 8 9
5 0122

4. Use the ogive shown to calculate the interquartile range of the data.
Cumulative frequency (%)
Cumulative frequency

50 100%
40
30
50%
20
10
0 0
100 120 140 160 180
Height (cm)

TOPIC 12 Univariate data 883


5. WE6 The following frequency distribution table gives the amount of time spent by 50 people shopping for

Christmas presents.

Time (h) 0 − < 0.5 0.5 − < 1 1 − < 1.5 1.5 − < 2 2 − < 2.5 2.5 − < 3 3 − < 3.5 3.5 − < 4
Frequency 1 2 7 15 13 8 2 2

Estimate the IQR of the data.


6. MC Calculate the interquartile range of the following data:
17, 18, 18, 19, 20, 21, 21, 23, 25
A. 8 B. 18 C. 4 D. 20 E. 25

Understanding
7. The following frequency distribution table shows the life expectancy in hours of 40 household batteries.

Life (h) 50 − < 55 55 − < 60 60 − < 65 65 − < 70 70 − < 75 75 − < 80


Frequency 4 10 12 8 5 1

a. Draw an ogive curve that represents the data in the table above.
b. Use the ogive to answer the following questions.
i. Calculate the median score.
ii. Determine the upper and lower quartiles.
iii. Calculate the interquartile range.
iv. Identify the number of batteries that lasted less than 60 hours.
v. Identify the number of batteries that lasted 70 hours or more.

8. Calculate the IQR for the following data.

120 − < 130


Class interval Frequency

130 − < 140


2

140 − < 150


3

150 − < 160


9

160 − < 170


14

170 − < 180


10

180 − < 190


8

190 − < 200


6
3

9. For each of the following sets of data, state:


i. the range
ii. the IQR.
a. 6, 9, 12, 13, 20, 22, 26, 29
b. 7, 15, 2, 26, 47, 19, 9, 33, 38
c. 120, 99, 101, 136, 119, 87, 123, 115, 107, 100

884 Jacaranda Maths Quest 10


Reasoning
10. Explain what the measures of spread tell us about a set of data.

11. As newly appointed coach of Terrorolo’s Meteors netball team, Kate decided to record each player’s
statistics for the previous season. The number of goals scored by the leading goal shooter was:
1, 3, 8, 18, 19, 23, 25, 25, 25, 26, 27, 28,
28, 28, 28, 29, 29, 30, 30, 33, 35, 36, 37, 40

a. Calculate the mean of the data.


b. Calculate the median of the data.
c. Calculate the range of the data.
d. Determine the interquartile range of the data.
e. There are three scores that are much lower than most. Explain the effect these scores have on the
summary statistics.
12. The following back-to-back stem-and-leaf plot shows the ages of 30 pairs of men and women when entering
their first marriage.
Key: 1 ∣ 6 = 16 years old
Leaf: Men Stem Leaf: Women
998 1 67789
99887644320 2 001234567789
9888655432 3 01223479
6300 4 1248
60 5 2

a. Determine the mean, median, range and interquartile range of each set.
b. Write a short paragraph comparing the two distributions.

Problem solving
13. Calculate the mean, median, mode, range and IQR of the following data
collected when the temperature of the soil around 25 germinating seedlings
was recorded:

28.9, 27.4, 23.6, 25.6, 21.1, 22.9, 29.6, 25.7, 27.4, 23.6, 22.4, 24.6, 21.8,
26.4, 24.9, 25.0, 23.5, 26.1, 23.6, 25.3, 29.5, 23.5, 22.0, 27.9, 23.6.
14. Four positive numbers a, b, c and d have a mean of 12, a median and mode
of 9 and a range of 14. Determine the values of a, b, c and d.

• range = 9
15. A set of five positive integer scores have the following summary statistics:

• median = 6
• Q1 = 3 and Q3 = 9.
a. Explain whether the five scores could be 1, 3, 6, 9 and 10.
b. A sixth score is added to the set. Determine whether there is a score that will maintain the summary
statistics given above. Justify your answer.

TOPIC 12 Univariate data 885


LESSON
12.4 Box plots
LEARNING INTENTIONS
At the end of this lesson you should be able to:
• calculate the five-number summary for a set of data
• draw a box plot showing the five-number summary of a data set
• calculate outliers in a data set
• describe the skewness of distributions
• compare box plots to dot plots or histograms
• draw parallel box plots and compare sets of data.

12.4.1 Five-number summary


eles-4953
• A five-number summary is a list consisting of the lowest score (Xmin ), lower quartile (Q1 ), median (Q2 ),
upper quartile (Q3 ) and greatest score (Xmax ) of a set of data.

WORKED EXAMPLE 7 Calculations using the five-number summary

From the following five-number summary, calculate:


a. the interquartile range b. the range.

Xmin Q1 Median (Q2 ) Q3 Xmax


29 37 39 44 48

a. IQR = Q3 − Q1
THINK WRITE

= 44 − 37
a. The interquartile range is the difference between

=7
the upper and lower quartiles.

b. Range = Xmax − Xmin


= 48 − 29
b. The range is the difference between the greatest

= 19
score and the lowest score.

12.4.2 Box plots


eles-4954
• A box plot is a graph of the five-number summary.
• Box plots consist of a central divided box with attached whiskers.
• The box spans the interquartile range.
• The median is marked by a vertical line drawn inside the box.
• The whiskers indicate the range of scores:

The lowest score The lower quartile The median The upper quartile The largest score
Xmin Q1 M Q3 Xmax

886 Jacaranda Maths Quest 10


• Box plots are always drawn to scale.
• They are presented either with the five-number summary figures 4 15 21 23 28
attached as labels (diagram at right) or with a scale presented
alongside the box plot like the diagram below. They can also be
drawn vertically.

0 5 10 15 20 25 30
Scale

Identification of extreme values or outliers


• If an extreme value or outlier occurs in a set of data, it can be denoted by a small cross on the box plot. The
whisker is then shortened to the next largest, or smallest, value.
• The box plot below shows that the lowest score was 5. This was an extreme value as the rest of the scores
were located within the range 15 to 42.
• Outliers are still included when calculating the range of the data.

0 5 10 15 20 25 30 35 40 45
Scale

• Outliers sit more than 1.5 × IQR away from Q1 or Q3 .

Identifying outliers
Lower limit = Q1 − 1.5 × IQR
Upper limit = Q3 + 1.5 × IQR
Any scores that sit outside these limits are considered outliers.

Symmetry and skewness in distributions


• A symmetrical plot has data that are evenly spaced around a central point.

Examples of a symmetrical stem-and-leaf plot and a symmetrical box plot are shown.

Stem Leaf
26* 6
27 013
27* 5689
28 011124
20 22 24 26 28 30
28* 5788
29 222
29* 5

TOPIC 12 Univariate data 887


• A negatively skewed plot has a higher frequency of data at the higher end. This is illustrated by the
stem-and-leaf plot below where the leaves increase in length as the data increase in value.
It is illustrated on the box plot when the median is much closer to the maximum value than the
minimum value.
Stem Leaf
5 1
6 29
7 1122 x
0 2 4 6 8 10
8 144566
9 534456777
• A positively skewed plot has a higher frequency of data at the lower end. This is illustrated on the
stem-and-leaf plot below where the leaves increase in length as the data decrease in value.
It is illustrated on the box plot when the median is much closer to the minimum value than the
maximum value.
Stem Leaf
5 134456777
6 244566
7 1122 x
0 2 4 6 8 10
8 16
9 5

WORKED EXAMPLE 8 Drawing a box plot

The following stem-and-leaf plot gives the speed of 25 cars caught by a roadside speed camera.

Key: 8 ∣ 2 = 82 km/h, 8∗ ∣ 6 = 86 km/h


Stem Leaf
8 224444
8∗ 5 5 6 6 7 9 9 9
9 01124
9∗ 5 6 9
10 0 2
10∗
11 4

a. Prepare a five-number summary of the data and draw a box plot to represent it.
b. Identify any outliers and redraw the box plot with outliers marked.
c. Describe the distribution of the data.

25 + 1
THINK WRITE
( )
1. First identify the positions of the median and The median is the th score — that is, the

n+1
2
12 + 1
upper and lower quartiles.( There are
) 25 data 13th score.
( )
values. The median is the th score. Q1 is the th score in the lower half — that
2 2
The lower quartile is the median of the lower half is, the 6.5th score. That is, halfway between the 6th
of the data. The upper quartile is the median of and 7th scores.
the upper half of the data (each half contains Q3 is halfway between the 6th and 7th scores in the
12 scores). upper half of the data.

888 Jacaranda Maths Quest 10


2. Mark the positions of the median and upper and 8 2 = 82 km/h
Key: 8*
lower quartiles on the stem-and-leaf plot. 6 = 86 km/h

Stem Leaf Q1
8 2 2 4 4 44 Median
8* 5 5 6 6 79 9 9
9 01124
9* 569
Q3
10 02
10*
11 4

a. Write the five-number summary: a. Five-number summary:


The lowest score is 82.
The lower quartile is between 84 and 85; Xmin Q1 Q2 Q3 Xmax
that is, 84.5. 82 84.5 89 94.5 114
The median is 89.
The upper quartile is between 94 and 95;
that is, 94.5.
The greatest score is 114.
80 90 100 110 120
Draw the box plot for this summary.
Speed (km/h)

b. IQR = Q3 − Q1
= 94.5 − 84.5
b. 1. Calculate the IQR.

= 10

Lower limit = 84.5 − 1.5 × 10


= 69.5
2. Calculate the lower and upper limits.

Upper limit = 94.5 + 1.5 × 10


= 109.5
3. Identify the outliers. 114 is above the upper limit of 109.5, so it is an
outlier.
4. Redraw the box plot, including the outlier
×
marked as a cross. Draw the whisker to the
next largest figure, 102.
80 90 100 110 120
Speed (km/h)

c. Describe the distribution. c. The data is positively skewed with an outlier


at 114.

TOPIC 12 Univariate data 889


TI | THINK DISPLAY/WRITE CASIO | THINK DISPLAY/WRITE
a. a. a. a.
1. In a new problem, on a 1. On the Statistics screen,
Lists & Spreadsheet page, label list1 as ‘cars’ and
label column A as ‘cars’ enter the values from the
and enter the values from stem-and-leaf plot.
the stem-and-leaf plot. Press EXE after entering
Press ENTER after each each value.
value.

2. To find a five-point 2. To find the summary


summary of the data, on statistics, tap:
a Calculator page press: • Calc
• CATALOG • One-Variable
• 1 Set values as:
• F • XList: main\cars
Then use the down • Freq: 1
arrow to scroll down to
FiveNumSummary and
press ENTER.

3. Press VAR and select 3. Tap OK.


‘cars’. Complete the entry Scroll down to find more

minX = 82
line as: statistics.

Q1 = 84.5
FiveNumSummary cars

Med = 89
Press ENTER.

Q3 = 95.5
Then press VAR, select

maxX = 114
‘stat.results’ and press
ENTER.

b. b. b. b.
To construct the box-and- To construct the box-and-
whisker plot, open a Data whisker plot, tap:
& Statistics page. Press • SetGraph
TAB to locate the label • Setting...
of the horizontal axis and Set:
select the variable ‘cars’. • Type: MedBox
Then press: • XList: main\cars
• MENU • Freq: 1
• 1: Plot Type Then tap the graphing icon.
• 2: Box Plot The box-and-whisker plot is
To change the colour, place displayed. As you scroll over the
the pointer over one of the box-and-whisker plot, the values
data points. Then press of the five-number summary The box-and-whisker
CTRL MENU. statistics are displayed. The data plot is displayed.
Then press: are skewed (positively).
• 6: Color
• 2: Fill Color.
Select whichever colour
you like from the palette.
Press ENTER.

890 Jacaranda Maths Quest 10


12.4.3 Comparing different graphical representations
eles-4956
Box plots and dot plots
• A box plot can be directly related to a dot plot.
• Dot plots display each data value represented by a dot placed on a number line.
• The following data represent the amount of money (in $) that a group of 27 five-year-olds had with them on
a day visiting the zoo with their parents.
0 0.85 0 1.8 1.65 8.45 3.75 0.55 4.1 2.4 2.15
1.2 1.35 0.9 3.45 1 0 0 1.45 1.25 1.7 2.65
1.85 4.75 3.9 1.15
• The dot plot below and its comparative box plot show the distribution of the data.

0 2 4 6 8 10
Amount of money ($)
• Both graphs indicate that the data is positively skewed and both graphs indicate the presence of the outlier.
However, the box plot provides a more concise summary of the centre and spread of the distribution.

Box plots and histograms


• Histograms are graphs that graphically represent the distribution of numerical data. It is commonly used in
statistics to visualize the frequency distribution of a set of continuous or discrete data.
• The following data are the number of minutes, rounded to the nearest minute, that forty Year 10 students
take to travel to their school on a particular day.

15, 22, 14, 12, 21, 34, 19, 11, 13, 0, 16,
4, 23, 8, 12, 18, 24, 17, 14, 3, 10, 12,
9, 15, 20, 5, 19, 13, 17, 11, 16, 19, 24,
12, 7, 14, 17, 10, 14, 23
• The data are displayed in the histogram and box plot
shown. 16
• Both graphs indicate that the data is approximately 14
symmetrical with an outlier. The histogram clearly 12
Frequency

10
shows the frequencies of each class interval. Neither
8
graph displays the original values.
6
• The histogram does not give precise information
4
about the centre, but the distribution of the data is 2
visible.
• However, the box plot shows the presence of an 0 5 10 15 20 25 30 35
outlier and provides a summary of the centre and
spread of the distribution. ×

0 5 10 15 20 25 30 35
Number of minutes

TOPIC 12 Univariate data 891


Parallel box plots
• Parallel box plots are useful tools to compare two sets of data.
• Parallel box plots are two or more box plots displayed with the same scale.

WORKED EXAMPLE 9 Comparing two sets of data

Each member of a class was given a jelly snake to stretch. They each
measured the initial length of their snake to the nearest centimetre and
then slowly stretched the snake to make it as long as possible.
They then measured the maximum length of the snake by recording how
far it had stretched at the time it broke. The results were recorded in the
following table.
Initial length Stretched Initial length Stretched
(cm) length (cm) (cm) length (cm)
13 29 14 27
14 28 13 27
17 36 15 36
10 24 16 36
14 35 15 36
16 36 16 34
15 37 17 35
16 37 12 27
14 30 9 17
16 33 16 41
17 36 17 38
16 38 16 36
17 38 17 41
14 31 16 33
17 40 11 21
The above data was drawn on parallel box plots as shown below.

Stretched

Initial

100 200 300 400 500


Length of snake (cm)

Compare the data sets and draw your conclusions.


THINK WRITE
1. Compare the median in the case of the The change in the length of the snake when stretched
initial and stretched length of the snake. is evidenced by the increased median and spread
shown on the box plots.
The median snake length before being stretched was
15.5 cm, but the median snake length after being
stretched was 35 cm.
2. Compare the spread for each box plot. The range increased after stretching, as did the IQR.

892 Jacaranda Maths Quest 10


Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Individual pathway interactivity: Box-and-whisker plots (int-4623)
Skewness (int-3823)
Box plots (int-6245)
Parallel box plots (int-6248)

Exercise 12.4 Box plots


12.4 Quick quiz 12.4 Exercise

Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 4, 6, 9, 12, 13, 16, 19 2, 5, 7, 10, 14, 17, 20 3, 8, 11, 15, 18, 21, 22

Fluency
1. WE7 From the following five-number summary, calculate:
Xmin Q1 Median Q3 Xmax
6 11 13 16 32
a. the interquartile range b. the range.

2. From the following five-number summary, calculate:

Xmin Q1 Median Q3 Xmax


101 119 122 125 128
a. the interquartile range b. the range.

3. From the following five-number summary, calculate:

Xmin Q1 Median Q3 Xmax


39.2 46.5 49.0 52.3 57.8
a. the interquartile range b. the range.

4. The box plot shows the distribution of final points scored by a football team over a season’s roster.

50 70 90 110 130 150


Points

a. Identify the team’s greatest points score.


b. Identify the team’s least points score.
c. Calculate the team’s median points score.
d. Calculate the range of points scored.
e. Calculate the interquartile range of points scored.

TOPIC 12 Univariate data 893


5. The box plot shows the distribution of data formed by counting the
number of gummy bears in each of a large sample of packs.
a. Identify the largest number of gummy bears in any pack. 30 35 40 45 50 55 60
b. Identify the smallest number of gummy bears in any pack. Number of gummy bears
c. Identify the median number of gummy bears in any pack.
d. Calculate the range of numbers of gummy bears per pack.
e. Calculate the interquartile range of gummy bears per pack.

Questions 6 to 8 refer to the following box plot.

5 10 15 20 25 30
Score
6. MC The median of the data is:
A. 20 B. 23 C. 25 D. 31 E. 5

7. MC The interquartile range of the data is:


A. 23 B. 26 C. 5 D. 20 to 25 E. 31

8. MC Select which of the following is not true of the data represented by the box plot.
A. One-quarter of the scores are between 5 and 20.
B. Half of the scores are between 20 and 25.
C. The lowest quarter of the data is spread over a wide range.
D. Most of the data are contained between the scores of 5 and 20.
E. One-third of the scores are between 5 and 20.

Understanding
9. The number of sales made each day by a salesperson is recorded over a 2-week period:

25, 31, 28, 43, 37, 43, 22, 45, 48, 33


a. Prepare a five-number summary of the data. (There is no need to draw a stem-and-leaf
plot of the data. Just arrange them in order of size.)
b. Draw a box plot of the data.

10. The data below show monthly rainfall in millimetres.


J F M A M J J A S O N D
10 12 21 23 39 22 15 11 22 37 45 30

a. Prepare a five-number summary of the data.


b. Draw a box plot of the data.

11. The stem-and-leaf plot shown details the age of 25 offenders who were Key: 1|8 = 18 years
WE8

caught during random breath testing.


Stem Leaf
a. Prepare a five-number summary of the data.
1 88999
b. Draw a box plot of the data.
2 000113469
c. Describe the distribution of the data.
3 0127
4 25
5 368
6 6
7 4

894 Jacaranda Maths Quest 10


12. The following stem-and-leaf plot details the price at which 30 blocks of land in a particular suburb
were sold.

Key: 12|4 = $124 000


Stem Leaf
12 4 7 9
13 0 0 2 5 5
14 0 0 2 3 5 5 7 9 9
15 0 0 2 3 7 7 8
16 0 2 2 5 8
17 5

a. Prepare a five-number summary of the data.


b. Draw a box plot of the data.

13. Prepare comparative box plots for the following dot plots (using the same axis) and describe what each plot
reveals about the data.
a. Number of sick days taken by workers last year at factory A

0 1 2 3 4 5 6 7
b. Number of sick days taken by workers last year at factory B

0 2 4 6 8 10 12 14

14. An investigation into the transport needs of an outer suburb community recorded
the number of passengers boarding a bus during each of its journeys, as follows.
12, 43, 76, 24, 46, 24, 21, 46, 54, 109, 87, 23, 78,
37, 22, 139, 65, 78, 89, 52, 23, 30, 54, 56, 32, 66, 49
Display the data by constructing a histogram using class intervals of 20 and a
comparative box plot on the same axis.

15. WE12 At a weight-loss clinic, the following weights (in kilograms) were recorded before and

after treatment.
Before 75 80 75 140 77 89 97 123 128 95 152 92
After 69 66 72 118 74 83 89 117 105 81 134 85

Before 85 90 95 132 87 109 87 129 135 85 137 102


After 79 84 90 124 83 102 84 115 125 81 123 94

a. Prepare a five-number summary for weight before and after treatment.


b. Draw parallel box plots for weight before and after treatment.
c. Comment on the comparison of weights before and after treatment.

Reasoning
16. Explain the advantages and disadvantages of box plots as a visual form of representing data.

TOPIC 12 Univariate data 895


17. The following data detail the number of hamburgers sold by a fast
food outlet every day over a 4-week period.
M T W T F S S
125 144 132 148 187 172 181
134 157 152 126 155 183 188
131 121 165 129 143 182 181
152 163 150 148 152 179 181

a. Prepare a stem-and-leaf plot of the data. (Use a class size of 10.)


b. Draw a box plot of the data.
c. Comment on what these graphs tell you about hamburger sales.

18. The following data show the ages of 30 mothers upon the birth of their first baby.

22, 21, 18, 33, 17, 23, 22, 24, 24, 20,
25, 29, 32, 18, 19, 22, 23, 24, 28, 20,
31, 22, 19, 17, 23, 48, 25, 18, 23, 20

a. Prepare a stem-and-leaf plot of the data. (Use a class size of 5.)


b. Draw a box plot of the data. Indicate any extreme values appropriately.
c. Describe the distribution in words. Comment on what the distribution says about the age that mothers
have their first baby.

Problem solving
19. Sketch a histogram for the box plot shown.

20. Consider the box plot below, which shows the number of weekly sales of houses by two real estate agencies.

HJ Looker

Hane & Roarne

0 1 2 3 4 5 6 7 8 9 10
Number of weekly sales

a. Determine the median number of weekly sales for each real estate agency.
b. State which agency had the greater range of sales. Justify your answer.
c. State which agency had the greater interquartile range of sales. Justify your answer.
d. State which agency performed better. Explain your answer.
21. Fifteen French restaurants were visited by three newspaper restaurant reviewers. The average price of a meal
for a single person was investigated. The following box plot shows the results.

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160


Price ($)

896 Jacaranda Maths Quest 10


a. Identify the price of the cheapest meal.
b. Identify the price of the most expensive meal.
c. Identify the median cost of a meal.
d. Calculate the interquartile range for the price of a meal.
e. Determine the percentage of the prices that were below the median.
22. The following data give the box plots for three different age groups in a triathlon for under thirities.

Under 20

20–24
Age

25–29

7000 8000 9000 10 000


Time in seconds

a. Identify the slowest time for the 20–24-year-olds.


b. Estimate the difference in time between the fastest triathlete in:
i. the under 20s and the 20–24 group
ii. the under 20s and the 25–29 group
iii. the under 20–24 group and the 25–29 group.
c. Comment on the overall performance of the three groups.

LESSON
12.5 The standard deviation (Optional)
LEARNING INTENTIONS
At the end of this lesson you should be able to:
• calculate the standard deviation of a small data set by hand
• calculate the standard deviation using technology
• interpret the mean and standard deviation of data
• identify the effect of outliers on the standard deviation.

12.5.1 Standard deviation


eles-4958
• The standard deviation for a set of data is a measure of how far the data values are spread out (deviate)
from the mean. The value of the standard deviation tells you the average deviation of the data from

• Deviation is the difference between each data value and the mean (x − x). The standard deviation is
the mean.

calculated from the square of the deviations.

TOPIC 12 Univariate data 897


Standard deviation formula
• Standard deviation is denoted by the lowercase Greek letter sigma, 𝜎, and can be calculated by

(x − x)2
using the following formula.

𝜎= ∑

n
where x is the mean of the data values and n is the number of data values.

• A low standard deviation indicates that the data values tend to be close to the mean.
• A high standard deviation indicates that the data values tend to be spread out over a large range, away from
the mean.
• Standard deviation can be calculated using a scientific or graphics calculator, or it can be calculated from a
frequency table by following the steps below.
Step 1 Calculate the mean.
Step 2 Calculate the deviations.
Step 3 Square each deviation.
Step 4 Sum the squares.
Step 5 Divide the sum of the squares by the number of data values.
Step 6 Take the square root of the result.

WORKED EXAMPLE 10 Calculating the standard deviation

The number of lollies in each of 8 packets is 11, 12, 13, 14, 16, 17, 18, 19.
Calculate the mean and standard deviation correct to 2 decimal places. Interpret the result.

11 + 12 + 13 + 14 + 16 + 17 + 18 + 19
THINK WRITE

1. Calculate the mean. x=


8
=
120

= 15
8

2. To calculate the deviations (x − x), set up a (x − x)


11 − 15 = −4
No. of lollies (x)
frequency table as shown and complete.
−3
11

−2
12

−1
13
14
16 1
17 2
18 3
19 4
Total

898 Jacaranda Maths Quest 10


(x − x) (x − x)2
the square of the deviations, (x − x) . Then
3. Add another column to the table to calculate

11 − 15 = −4
No. of lollies (x)
sum the results: ∑ (x − x) .
2

−3
2 11 16

−2
12 9

−1
13 4
14 1
16 1 1
17 2 4
18 3 9
19 4 16
∑(x − x) = 60
2
Total

∑ (x − x)
𝜎=

2

4. To calculate the standard deviation, divide the


sum of the squares by the number of data n

=
values, then take the square root of the result.

60
8
≈ 2.74 (correct to 2 decimal places)
5. Check the result using a calculator. The calculator returns an answer of 𝜎n = 2.73861.
The answer is confirmed.
6. Interpret the result. The average (mean) number of lollies in each pack
is 15 with a standard deviation of 2.74, which means
that the number of lollies in each pack differs from the
mean by an average of 2.74.

TI | THINK DISPLAY/WRITE CASIO | THINK DISPLAY/WRITE


1. In a new problem, on a On the Statistics screen, label
Calculator page, complete list1 as ‘x’ and enter the values
the entry lines as shown. from the question. Press EXE
This stores the data after entering each value. To find
values to the variable the summary statistics, tap:

lollies: = {11, 12, 13,


‘lollies’. • Calc
• One-Variable
14, 15, 16, 17, 18, 19} Set values as:
Although we can find • XList: main\x
many summary statistics, The mean number of lollies is • Freq: 1

deviation is 𝜍 = 2.74.
to calculate the mean 15 and the population standard Tap OK.

as 𝜍x and the mean is shown


only, open a Calculator The standard deviation is shown
page and press: The mean number of
• MENU as x. lollies is 15 with a
• 6: Statistics standard deviation
• 3: List Math of 2.74.
• 3: Mean
Press VAR and select
‘lollies’, then press
ENTER.

TOPIC 12 Univariate data 899


2. To calculate the
population standard
deviation only, press:
• MENU
• 6: Statistics
• 3: List Math
• 9: Population standard
deviation
Press VAR and select
‘lollies’, then press
ENTER. Press CTRL
ENTER to get a decimal
approximation.

• When calculating the standard deviation from a frequency table, the frequencies must be taken into
account. Therefore, the following formula is used.

∑ f(x − x)
𝜎=

2

WORKED EXAMPLE 11 Calculating standard deviation from a frequency table

Lucy’s scores in her last 12 games of golf were 87, 88, 88, 89, 90, 90, 90, 92, 93, 93, 95 and 97.
Calculate the mean score and the standard deviation correct to 2 decimal places. Interpret your
result.
THINK WRITE
1. To calculate the mean, Golf score (x) Frequency ( f) fx
first set up a frequency
87 1 87
table.
88 2 176
89 1 89
90 3 270
92 1 92
93 2 186
95 1 95

∑ f = 12 ∑ fx = 1092
97 1 97
Total

∑ fx
x=
∑f
2. Calculate the mean.

=
1092

= 91
12

900 Jacaranda Maths Quest 10


(x − x)
deviations (x − x), add 87 − 91 = −4
3. To calculate the Golf score (x) Frequency ( f ) fx

−3
87 1 87

−2
another column to the 88 8 176
frequency table and
−1
89 1 89
complete.
90 3 270
92 1 92 1
93 2 186 2
95 1 95 4

∑ f = 12 ∑ fx = 1092
97 1 97 6
Total

(x − x) f(x − x)
4. Add another column to 2

87 − 91 = −4 1 × (−4)2 = 16
Golf score (x) Frequency ( f ) fx
the table and multiply

deviations, (x − x)2 , by −3
the square of the 87 1 87

the frequency f(x − x) . −2


88 8 176 18

−1
2
89 1 89 4

∑ f(x − x) .
Then sum the results: 90 3 270 3
2
92 1 92 1 1
93 2 186 2 8
95 1 95 4 16

∑ f = 12 ∑ fx = 1092
97 1 97 6 36
Total 102

∑ f(x − x)
𝜎=

2

5. Calculate the standard


deviation using the n

=
formula.

102
12
≈ 2.92
(correct to 2 decimal places)

6. Check the result using a The calculator returns an answer of 𝜎n = 2.91548.


calculator. The answer is confirmed.
7. Interpret the result. The average (mean) score for Lucy is 91 with a standard deviation of
2.92, which means that her score differs from the mean by an average
of 2.92.

TOPIC 12 Univariate data 901


TI | THINK DISPLAY/WRITE CASIO | THINK DISPLAY/WRITE
1. In a new problem, on a 1. On the Statistic screen,
Lists & Spreadsheet page, label list1 as ‘score’ and
label column A as ‘score’ enter the values from the
and label column B as ‘f’. question. Press EXE after
Enter the values and the entering each value. To
frequency corresponding to find the summary statistics,
each score as shown in the tap:
table. Press ENTER after • Calc
each value. • One-Variable
Set values as:
• XList: main\score
• Freq: 1

2. To find all the summary 2. Tap OK. The mean is x = 91 and

shown as 𝜍x and the mean deviation is 𝜍 = 2.92


statistics, open a Calculator The standard deviation is the population standard
page and press:
• MENU is shown as x. correct to 2 decimal
• 6: Statistics places.
• 1: Stat Calculations
• 1: One Variable
Statistics...

The mean is x = 91 and the


Select 1 as the number
of lists, then on the

is 𝜍 = 2.92 correct to 2 decimal


One-Variable Statistics population standard deviation
page, select ‘score’ as
the X1 List and ‘f’ as the places.
Frequency List. Leave the
next two fields empty, TAB
to OK and press ENTER.

Why the deviations are squared


When you take an entire data set, the sum of the deviations from the mean is zero, that is, ∑(x − x) = 0.
When the data value is less than the mean (x < x), the deviation is negative.

When the data value is greater than the mean (x > x), the deviation is positive.


• The negative and positive deviations cancel each other out; therefore, calculating the sum and average of
the deviations is not useful.
• By squaring all of the deviations, each deviation becomes positive, so the average of the deviations

This explains why the standard deviation is calculated using the squares of the deviations, (x − x) , for all
becomes meaningful.
2

data values.

Standard deviations of populations and samples


• So far we have calculated the standard deviation for a population of data, that is, for complete sets of data.
There is another formula for calculating standard deviation for samples of data, that is, data that have been
randomly selected from a larger population.
• The sample standard deviation is more commonly used in day-to-day life, as it is usually impossible to
collect data from an entire population.
For example, if you wanted to know how much time Year 10 students across the country spend on social
media, you would not be able to collect data from every student in the country. You would have to take a
sample instead.
• The sample standard deviation is denoted by the letter s, and can be calculated using the following formula.

902 Jacaranda Maths Quest 10


Sample standard deviation formula

f(x − x)
s= ∑
2

n−1

• Calculators usually display both values for the standard deviation, so it is important to understand the
difference between them.

12.5.2 Effects on standard deviation


eles-4959
• The standard deviation is affected by extreme values.

WORKED EXAMPLE 12 Interpreting the effects on standard deviation

On a particular day Lucy played golf brilliantly and scored 60.


The scores in her previous 12 games of golf were 87, 88, 88, 89, 90, 90,
90, 92, 93, 93, 95 and 97 (see Worked example 11).
Comment on the effect this latest score has on the standard deviation.

x = 88.6154 𝜎 = 8.7225
THINK WRITE

≈ 88.62 ≈ 8.7225
1. Use a calculator to calculate the mean and
the standard deviation.
2. Interpret the result and compare it to the In the first 12 games Lucy’s mean score was 91 with
results found in Worked example 11. a standard deviation of 2.92. This implied that Lucy’s
scores on average were 2.92 either side of her average
of 91. Lucy’s latest performance resulted in a mean
score of 88.62 with a standard deviation of 8.72.
This indicates a slightly lower mean score, but the
much higher standard deviation indicates that the data
are now much more spread out.

12.5.3 Properties of standard deviation


• If a constant c is added to all data values in a set, the deviations (x − x) will remain unchanged and
eles-4961

• If all data values in a set are multiplied by a constant k, the deviations (x − x) will be multiplied by k, that is
consequently the standard deviation remains unchanged.

k(x − x); consequently the standard deviation is increased by a factor of k.


• The standard deviation can be used to measure consistency.
• When the standard deviation is low we are able to say that the scores in the data set are more consistent
with each other.

TOPIC 12 Univariate data 903


WORKED EXAMPLE 13 Calculating numerical changes to the standard deviation

For the data 5, 9, 6, 11, 10, 7:


a. calculate the standard deviation
b. calculate the standard deviation if 4 is added to each data value. Comment on the effect.
c. calculate the standard deviation if all data values are multiplied by 2. Comment on the effect.

5 + 9 + 6 + 11 + 10 + 7
THINK WRITE

a. 1. Calculate the mean. a. x =

=8
6

(x − x) (x − x)2
2. Set up a frequency table and enter the squares

5 5 − 8 = −3
(x)
of the deviations.

−2
9

−1
6 4
7 1
9 1 1
10 2 4

∑ (x − x) = 28
11 3 9
2
Total

∑ (x − x)
𝜎=

2

3. To calculate the standard deviation, apply the


formula for standard deviation. n

=

28

≈ 2.16 (correct to 2 decimal places)


6

9 + 13 + 10 + 15 + 14 + 11
b. 1. Add 4 to each data value in the set. b. 9, 13, 10, 15, 14, 11

2. Calculate the mean. x=


= 12
6

(x − x) (x − x)
3. Set up a frequency table and enter the squares

9 9 − 12 = −3
2
(x)
of the deviations.

−2
9

−1
10 4
11 1
13 1 1
14 2 4

∑ (x − x) = 28
15 3 9
2
Total

∑ (x − x)
𝜎=

2

4. To calculate the standard deviation, apply the


formula for standard deviation. n

=

28

≈ 2.16 (correct to 2 decimal places)


6

904 Jacaranda Maths Quest 10


5. Comment on the effect of adding of 4 to each Adding 4 to each data value increased the
data value. mean but had no effect on the standard
deviation, which remained at 2.16.

10 + 18 + 12 + 22 + 20 + 14
c. 1. Multiply each data value in the set by 2. c. 10, 18, 12, 22, 20, 14

2. Calculate the mean. x=


= 16
6

(x − x) (x − x)2
3. Set up a frequency table and enter the squares

10 10 − 16 = −6
(x)
of the deviations.

−4
36

−2
12 16
14 4
18 2 4
20 4 16

∑ (x − x) = 112
22 6 36
2
Total

∑ (x − x)
𝜎= =

2 √
112
4. To calculate the standard deviation, apply the

≈ 4.32 (correct to 2 decimal places)


formula for standard deviation. n 6

5. Comment on the effect of multiplying each Multiplying each data value by 2 doubled
data value by 2. the mean and doubled the standard deviation,
which changed from 2.16 to 4.32.

12.5.4 Average absolute deviations


eles-3219
• The average absolute deviation is a measure of spread that indicates the average absolute distance between
data values and the centre of the distribution.
• The average absolute deviation may be calculated using the mean or the median.
• Begin by calculating the centre of the distribution and then calculate how far away each data point is from
the centre using positive distances. Finally, calculate the centre of the deviations.

WORKED EXAMPLE 14 Calculating the average absolute deviation

For the data 5, 3, 17, 1, 4:


a. calculate the median absolute deviation
b. calculate the mean absolute deviation.

THINK WRITE
a. 1. Put the data in ascending order. 1, 3, 4, 5, 17
2. Locate the median. 1, 3, 4, 5, 17

TOPIC 12 Univariate data 905


x |x − M|
3. Calculate the absolute deviations by calculating how
far away each data point is from the median, using 1 3
positive distances. 3 1
4 0
5 1
17 13
4. Arrange the absolute deviations in ascending order. 0, 1, 1, 3, 13
5. Locate the median absolute deviation. 0, 1, 1, 3, 13

5 + 3 + 17 + 1 + 4
6. Write the answer. The median absolute deviation is 1.

b. 1. Calculate the mean of the data set. x=

=
5
30

=6
5

x |x − x|
2. Calculate the absolute deviations by calculating how
far away each data point is from the mean, using 5 1
positive distances. 3 3
17 11
1 5
4 2
1 + 3 + 11 + 5 + 2
3. Calculate the mean absolute deviation. MAD =

=
5
22

= 4.4
5

4. Write the answer. The mean absolute deviation is 4.4.

Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Individual pathway interactivity: The standard deviation (int-4624)
The standard deviation for a sample (int-4814)

906 Jacaranda Maths Quest 10


Exercise 12.5 The standard deviation (Optional)
12.5 Quick quiz 12.5 Exercise

Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 2, 6, 10, 11, 12, 15 3, 4, 7, 9, 13, 16 5, 8, 14, 17

Fluency
1. WE10 Calculate the standard deviation of each of the following data sets, correct to 2 decimal places.
a. 3, 5, 8, 2, 7, 1, 6, 5 b. 11, 8, 7, 12, 10, 11, 14
c. 25, 15, 78, 35, 56, 41, 17, 24 d. 5.2, 4.7, 5.1, 12.6, 4.8

2. WE14a Calculate the median absolute deviation for each of the data sets in question 1.
3. WE14b Calculate the mean absolute deviation for each of the following data sets, correct to 2 decimal places.
a. 3, 5, 8, 2, 7, 1, 6, 5
b. 25, 15, 78, 35, 56, 41, 17, 24

4. Calculate the standard deviation of each of the following data sets, correct to 2 decimal places.

a. Score (x) Frequency ( f) b. Score (x) Frequency ( f)


1 1 16 15
2 5 17 24
3 9 18 26
4 7 19 28
5 3 20 27
c. Score (x) Frequency ( f) d. Score (x) Frequency ( f)
8 15 65 15
10 19 66 15
12 18 67 16
14 7 68 17
16 6 69 16
18 2 70 15
71 15
72 12

5. Complete the following frequency distribution table and use it to calculate the standard deviation of the data
set, correct to 2 decimal places.

Class Class centre (x) Frequency ( f)


1–10 6
11–20 15
21–30 25
31–40 8
41–50 6

TOPIC 12 Univariate data 907


Understanding
6. WE11 First-quarter profit increases for 8 leading companies are given below as percentages.
2.3, 0.8, 1.6, 2.1, 1.7, 1.3, 1.4, 1.9
Calculate the mean score and the standard deviation for this set of data. Express your answers correct to
2 decimal places.
7. The heights in metres of a group of army recruits are given.

1.8, 1.95, 1.87, 1.77, 1.75, 1.79, 1.81, 1.83, 1.76, 1.80, 1.92, 1.87, 1.85, 1.83
Calculate the mean score and the standard deviation for this set of data. Express your answers correct to
2 decimal places.

8. Times (to the nearest tenth of a second) for the heats in the open 100-m sprint at the
Key: 11|0 = 11.0 s
school sports are given in the stem-and-leaf plot shown.
Stem Leaf
Calculate the standard deviation for this set of data and express your answer correct
to 2 decimal places. 11 0
11 2 3
11 4 4 5
11 6 6
11 8 8 9
12 0 1
12 2 2 3
12 4 4
12 6
12 9

9. The number of outgoing phone calls from an office each day over a 4-week period is shown in the stem-and-
leaf plot.
Key: 1|3 = 13 calls
Stem Leaf
0 89
1 3479
2 01377
3 34
4 15678
5 38
Calculate the standard deviation for this set of data and express your answer correct to 2 decimal places.

908 Jacaranda Maths Quest 10


Key: 1|6 = 16 people
10. MC A new legal aid service has been operational for only 5 weeks. The number of

people who have made use of the service each day during this period is set out in
Stem Leaf
the stem-and-leaf plot shown.
The standard deviation (to 2 decimal places) of these data is: 0 24
0 779
A. 6.00
B. 6.34 1 014444
C. 6.47 1 5667889
D. 15.44 2 122333
E. 9.37 2 7

11. WE12 The speeds, in km/h, of the first 25 cars caught by a roadside speed camera on a particular day were:
82, 82, 84, 84, 84, 84, 85, 85, 85, 86, 86, 87, 89, 89, 89, 90, 91, 91, 92, 94, 95, 96, 99, 100, 102
The next car that passed the speed camera was travelling at 140 km/h.
Comment on the effect of the speed of this last car on the standard deviation for the data.

Reasoning
12. Explain what the standard deviation tells us about a set of data.

13. WE13 For the data 1, 4, 5, 9, 11:


a. calculate the standard deviation
b. calculate the standard deviation if 7 is added to each data value. Comment on the effect.
c. calculate the standard deviation if all data values are multiplied by 3. Comment on the effect.

14. Show using an example the effect, if any, on the standard deviation of adding a data value to a set of data
that is equivalent to the mean.

Problem solving
15. If the mean for a set of data is 45 and the standard deviation is 6, determine how many standard deviations
above the mean is a data value of 57.
16. Five numbers a, b, c, d and e have a mean of 12 and a standard deviation of 4.
a. If each number is increased by 3, calculate the new mean and standard deviation.
b. If each number is multiplied by 3, calculate the new mean and standard deviation.

17. Twenty-five students sat a test and the results for 24 of the students are given in the following stem-and-leaf
plot.
Key: 1|2 = 12 marks
Stem Leaf
0 89
1 123789
2 23568
3 012468
4 02568

a. If the average mark for the test was 27.84, determine the mark obtained by the 25th student.
b. Determine how many students scored higher than the median score.
c. Calculate the standard deviation of the marks, giving your answer correct to 2 decimal places.

TOPIC 12 Univariate data 909


LESSON
12.6 Comparing data sets
LEARNING INTENTIONS
At the end of this lesson you should be able to:
• choose an appropriate measure of centre and spread to analyse data
• interpret and make decisions based on measures of centre and spread.

12.6.1 Comparing data sets


eles-4962
• Decisions need to be made about which measure of centre and which measure of spread to use when
analysing and comparing data.
• The mean is calculated using every data value in the set. The median is the middle score of an ordered set
of data, so it is a more useful measure of centre when a set of data contains outliers.
• The range is determined by calculating the difference between the maximum and minimum data values, so
it includes outliers. It is useful, however, when we are interested in extreme values such as high and low
tides or maximum and minimum temperatures.
• The interquartile range is the difference between the upper and lower quartiles, so it does not include every
data value in its calculation, but it will overcome the problem of outliers skewing data.
• The standard deviation is calculated using every data value in the set. It measures the spread around
the mean.

WORKED EXAMPLE 15 Interpreting mean and standard deviation

For the two sets of data 6, 7, 8, 9, 10 and 12, 4, 10, 11, 3:


a. calculate the mean
b. calculate the standard deviation
c. comment on the similarities and differences of each statistic.

6 + 7 + 8 + 9 + 10
THINK WRITE

a. 1. Calculate the mean of a. x1 =

=8
the first set of data. 5

12 + 4 + 10 + 11 + 3
2. Calculate the mean of x2 =
=8
the second set of data. 5

(6 − 8)2 + (7 − 8)2 + (8 − 8)2 + (9 − 8)2 + (10 − 8)2


b. 1. Calculate the standard b.

𝜎1 =

deviation of the first set

≈ 1.41
of data. 5

(12 − 8)2 + (4 − 8)2 + (10 − 8)2 + (11 − 8)2 + (3 − 8)2


𝜎2 =

2. Calculate the standard

≈ 3.74
deviation of the second 5
set of data.

910 Jacaranda Maths Quest 10


c. Comment on the findings. c. For both sets of data the mean was the same, 8. However,
the standard deviation for the second set (3.74) was much
higher than the standard deviation of the first set (1.41),
implying that the second set is more widely distributed
than the first.

• When multiple data displays are used to display similar sets of data, comparisons and conclusions can then
be drawn about the data.
• We can use back-to-back stem-and-leaf plots and parallel box plots to help compare statistics such as
the median, range and interquartile range.

Back-to-back stem-and-leaf plots


Parallel box plots
Leaf Stem Leaf
5 0 8 9
6 6 3 2 1 4 5 6 8 9
8 4 1 2 0 4 5 7
9 8 5 3 0 3 1 6 9
0 1 2 3 4 5 6 7 8
7 1 4 1 3
5 2 5

WORKED EXAMPLE 16 Comparing data sets

Below are the scores achieved by two students in eight Mathematics tests throughout the year.
John: 45, 62, 64, 55, 58, 51, 59, 62
Penny: 84, 37, 45, 80, 74, 44, 46, 50
a. Identify the student who performed better over the eight tests. Justify your answer.
b. Identify the student who was more consistent over the eight tests. Justify your answer.

a. John: x = 57, 𝜎 = 6
THINK WRITE

Penny: x = 57.5, 𝜎 = 17.4


a. Compare the mean for each student. The student
with the higher mean performed better overall.
Penny performed slightly better on average as
her mean mark was higher than John’s.
b. Compare the standard deviation for each student. b. John was the more consistent student because his
The student with the lower standard deviation standard deviation was much lower than Penny’s.
performed more consistently. This means that his test results were closer to his
mean score than Penny’s were to hers.

TI | THINK DISPLAY/WRITE CASIO | THINK DISPLAY/WRITE


a, b. a, b.
1. In a new problem, on 1. On the Statistics screen,
a Lists & Spreadsheet label list1 as ‘John’
page, label column A as and list2 as ‘Penny’. Enter
‘john’ and column B as the data set as shown.
‘penny’. Enter the data Press EXE after each value.
sets from the question.
Press ENTER after each
value.

TOPIC 12 Univariate data 911


2. To calculate only the 2. To calculate the mean and
mean and standard standard deviation of each
deviation of each data data set, tap:
set, open a Calculator • Calc
page and complete the • Two-Variable
entry lines as: Set values as:
mean(john) • XList: main\John
stDevPop(john) • YList: main\Penny

John: x = 57, 𝜍 = 6
mean(penny) • Freq: 1

Penny: x = 57.5, 𝜍 = 17.4


stDevPop(penny) Tap OK.
Press CTRL ENTER The x-values relate to John

John: x = 57, 𝜍 = 6
after each entry to get a correct to 2 decimal places. and the y-values to Penny.

Penny: x = 57.5,
decimal approximation. Scroll down to see all the

𝜍 = 17.4
statistics.

correct to 2 decimal
places.
3. To draw the two box To draw the two box-and-
plots on the same Data whisker plots on the same
& Statistics page, press Statistics screen, tap:
TAB to locate the label • SetGraph
of the horizontal axis and • Setting...
select the variable ‘john’.
Set values as:
Then press: • Type: MedBox
• MENU
• XList: main\John
• 1: Plot Type
• Freq: 1
• 2: Box Plot Penny performed slightly better
overall as her mean mark was Tap 2 in the row of numbers
Then press:
at the top of the screen.
• MENU higher than John’s; however,
Set values as:
• 2: Plot Properties John was more consistent as his Penny performed
• Type: MedBox
• 5: Add X-variable and standard deviation was lower slightly better overall
• XList: main\Penny
select ‘penny’. than Penny’s. as her mean mark was
• Freq: 1
To change the colour, higher than John’s;
Tap Set.
place the pointer over however, John was
Tap SetGraph and tick
one of the data points. more consistent as his
StatGraph1 and StatGraph2.
Then press CTRL standard deviation was
Tap the graphing icon to
MENU. Then press: lower than Penny’s.
display the graphs.
• 6: Color
• 2: Fill Color
Select whichever colour
you like from the palette
for each of the box plots.

Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Individual pathway interactivity: Comparing data sets (int-4625)
Back-to-back stem plots (int-6252)

912 Jacaranda Maths Quest 10


Exercise 12.6 Comparing data sets
12.6 Quick quiz 12.6 Exercise

Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 5, 8, 10, 11, 17 2, 4, 6, 9, 12, 13, 18 3, 7, 14, 15, 16, 19, 20

Fluency
1. WE15 For the two sets of data, 65, 67, 61, 63, 62, 60 and 56, 70, 65, 72, 60, 55:
a. calculate the mean
b. calculate the standard deviation
c. comment on the similarities and differences.

2. A bank surveys the average morning and afternoon waiting times for customers. The figures were taken each
Monday to Friday in the morning and afternoon for one month. The stem-and-leaf plot below shows
the results.

Key: 1|2 = 1.2 minutes


Leaf: Morning Stem Leaf: Afternoon
7 0 788
86311 1 1124456667
9666554331 2 2558
952 3 16
5 4
5 7

a. Identify the median morning waiting time and the median afternoon waiting time.
b. Calculate the range for morning waiting times and the range for afternoon waiting times.
c. Use the information given in the display to comment about the average waiting time at the bank in the
morning compared with the afternoon.
3. In a class of 30 students there are 15 boys and 15 girls. Their heights are measured in metres and are
listed below.
Boys: 1.65, 1.71, 1.59, 1.74, 1.66, 1.69 1.72, 1.66, 1.65, 1.64, 1.68, 1.74, 1.57, 1.59, 1.60
Girls: 1.66, 1.69, 1.58, 1.55, 1.51, 1.56, 1.64, 1.69, 1.70, 1.57, 1.52, 1.58, 1.64, 1.68, 1.67

Display this information in a back-to-back stem-and-leaf plot and comment on their height distribution.
4. The stem-and-leaf plot at right is used to display the number of
Key: 1|5 = 15 vehicles
vehicles sold by the Ford and Hyundai dealerships in a Sydney
Leaf: Ford Stem Leaf: Hyundai
suburb each week for a three-month period.
74 0 39
a. State the median of both distributions.
952210 1 111668
b. Calculate the range of both distributions.
c. Calculate the interquartile range of both distributions. 8544 2 2279
d. Show both distributions on a box plot. 0 3 5

TOPIC 12 Univariate data 913


5. The box plot shown displays statistical data for two
AFL teams over a season.

Sydney Swans

Brisbane Lions

50 60 70 80 90 100 110 120 130 140 150


Points
a. State the team that had the higher median score.
b. Determine the range of scores for each team.
c. For each team calculate the interquartile range.

Understanding
6. Tanya measures the heights (in m) of a group of Year
10 boys and girls and produces the following five-point
summaries for each data set.
Boys: 1.45, 1.56, 1.62, 1.70, 1.81
Girls: 1.50, 1.55, 1.62, 1.66, 1.73
a. Draw a box plot for both sets of data and display them
on the same scale.
b. Calculate the median of each distribution.
c. Calculate the range of each distribution.
d. Calculate the interquartile range for each distribution.
e. Comment on the spread of the heights among the boys
and the girls.

7. The box plots show the average daily sales of cold drinks at the
school canteen in summer and winter. Summer

a. Calculate the range of sales in both summer and winter.


Winter
b. Calculate the interquartile range of the sales in both summer
and winter.
c. Comment on the relationship between the two data sets, both 0 5 10 15 20 25 30 35 40
in terms of measures of centre and measures of spread. Daily sales of cold drinks

8. MC Andrea surveys the age of people at two movies being shown


at a local cinema. The box plot shows the results. Movie A
Select which of the following conclusions could be drawn based
on the information shown in the box plot. Movie B

A. Movie A attracts an older audience than Movie B


B. Movie B attracts an older audience than Movie A. 0 10 20 30 40 50 60 70 80
Age
C. Movie A appeals to a wider age group than Movie B.
D. Movie B appeals to a wider age group than Movie A.
E. Both movies appeal equally to the same age groups.

914 Jacaranda Maths Quest 10


9. MC The figures below show the age of the first 10 men and women to finish a marathon.
Men: 28, 34, 25, 36, 25, 35, 22, 23, 40, 24
Women: 19, 27, 20, 26, 30, 18, 28, 25, 28, 22
Choose which of the following statements are correct.
Note: There may be more than one correct answer.
A. The mean age of the men is greater than the mean age of the women.
B. The range is greater among the men than among the women.
C. The interquartile range is greater among the men than among the women.
D. The standard deviation is greater among the men than among the women.
E. The standard deviation is less among the men than among the women.

Reasoning
10. WE16 Cory recorded his marks for each test that he did in English and Science throughout the year.
English: 55, 64, 59, 56, 62, 54, 65, 50
Science: 35, 75, 81 32, 37, 62, 77, 75
a. Identify the subject in which Cory received a better average. Justify your answer.
b. Identify the subject in which Cory performed more consistently. Justify your answer.

11. Draw an example of a graph that is:


a. symmetrical
b. positively skewed with one mode
c. negatively skewed with two modes.

12. The police set up two radar speed checks on a back street of Sydney and on a main road. In both places the
speed limit is 60 km/h. The results of the first 10 cars that have their speed checked are given below.
Back street: 60, 62, 58, 55, 59, 56, 65, 70, 61, 64
Main road: 55, 58, 59, 50, 40, 90, 54, 62, 60, 60
a. Calculate the mean and standard deviation of the readings taken at each point.
b. Identify the road where drivers are generally driving faster. Justify your answer.
c. Identify the road where the spread of readings is greater. Justify your answer.

13. In boxes of Smarties it is advertised that there are 50 Smarties in each box. Two machines are used to
distribute the Smarties into the boxes. The results from a sample taken from each machine are shown in the
stem-and-leaf plot.
Key: 5|1 = 51 5 ∗ |6 = 56
Leaf: Machine A Stem Leaf: Machine B
4 4
99877665 4∗ 57899999999
43222111000000 5 0000011111223
55 5∗ 9

a. Display the data from both machines on parallel box plots.


b. Calculate the mean and standard deviation of the number of Smarties distributed from both machines.
c. State which machine is the more dependable. Justify your answer.

TOPIC 12 Univariate data 915


14. Nathan and Timana are wingers in their local rugby league team. The
number of tries they have scored in each season are listed below.
Nathan: 25, 23, 13, 36, 1, 8, 0, 9, 16, 20
Timana: 5, 10, 12, 14, 18, 11, 8, 14, 12, 19
a. Calculate the mean number of tries scored by each player.
b. Calculate the range of tries scored by each player. Justify your answer.
c. Calculate the interquartile range of tries scored by each player. Justify
your answer.
d. State which player would you consider to be the more consistent. Justify
your answer.

15. Year 10 students at Merrigong High School sit exams in


Science and Maths. The results are shown in the table.

Mark Number of students in Science Number of students in Maths


51−60 7 6
61−70 10 7
71−80 8 12
81−90 8 9
91−100 2 6

a. Determine if either distribution is symmetrical.


b. If either distribution is not symmetrical, state whether it is positively or negatively skewed.
c. Discuss the possible reasons for any skewness.
d. State the modal class of each distribution.
e. Determine which subject has the greater standard deviation greater. Explain your answer.
16. A new drug for the relief of cold symptoms has been developed.
To test the drug, 40 people were exposed to a cold virus. Twenty
patients were then given a dose of the drug while another 20
patients were given a placebo.
(In medical tests a control group is often given a placebo drug.
The subjects in this group believe that they have been given the
real drug but in fact their dose contains no drug at all.)
All participants were then asked to indicate the time when they
first felt relief of symptoms. The number of hours from the time
the dose was administered to the time when the patients first felt
relief of symptoms are detailed below.

Group A (drug)
25, 29, 32, 45, 18, 21, 37, 42, 62, 13,
42, 38, 44, 42, 35, 47, 62, 17, 34, 32
Group B (placebo)
25, 17, 35, 42, 35, 28, 20, 32, 38, 35,
34, 32, 25, 18, 22, 28, 21, 24, 32, 36
a. Display the data on a back-to-back stem-and-leaf plot.
b. Display the data for both groups on a parallel box plot.
c. Make comparisons of the data. Use statistics in your answer.
d. Explain if the drug works. Justify your answer.
e. Determine other considerations that should be taken into account when trying to draw conclusions from
an experiment of this type.

916 Jacaranda Maths Quest 10


Problem solving
17. The heights of Year 10 and Year 12 students (to the nearest centimetre) are being investigated. The results of
some sample data are shown in the table.
Year 10 160 154 157 170 167 164 172 158 177 180 175 168 159 155 163 163 169 173 172 170
Year 12 160 172 185 163 177 190 183 181 176 188 168 167 166 177 173 172 179 175 174 180

a. Draw a back-to-back stem-and-leaf plot.


b. Draw a parallel box plot.
c. Comment on what the plots tell you about the heights of Year 10 and Year 12 students.

18. Kloe compares her English and Maths marks. The results of eight tests in each subject are shown below.
English: 76, 64, 90, 67, 83, 60, 85, 37
Maths: 80, 56, 92, 84, 65, 58, 55, 62
a. Calculate Kloe’s mean mark in each subject.
b. Calculate the range of marks in each subject.
c. Calculate the standard deviation of marks in each subject.
d. Based on the above data, determine the subject that Kloe has performed more consistently in.
19. A sample of 50 students was surveyed on whether they owned an iPad or a mobile phone. The results
showed that 38 per cent of the students owned both.
Sixty per cent of the students owned a mobile phone and there were four students who had an iPad only.
Evaluate the percentage of students that did not own a mobile phone or an iPad.
20. The life expectancy of non-Aboriginal and non–Torres Strait Islander people in Australian states and
territories is shown on the box plot below.

70 75 80 85
Life expectancy of non-Aboriginal and non–Torres Strait
Islander people in Australian states and territories

The life expectancies of Aboriginal and Torres Strait Islander people in each of the Australian states and
territories are 56, 58.4, 51.3, 57.8, 53.9, 55.4 and 61.0.
a. Draw parallel box plots on the same axes. Compare and comment on your results.
b. Comment on the advantage and disadvantage of using a box plot.

TOPIC 12 Univariate data 917


LESSON
12.7 Populations and samples
LEARNING INTENTIONS
At the end of this lesson you should be able to:
• describe the difference between populations and samples
• recognise the difference between a census and a survey and identify a preferred method in different
circumstances.

12.7.1 Populations
eles-4946
• The term population refers to a complete set of individuals,
objects or events belonging to some category.
• When data are collected from a whole population, the
process is known as a census.
• It is often not possible, nor cost-effective, to conduct
a census.
• For this reason, samples have to be selected carefully
Population (size N)
from the population. A sample is a subset of a
population.
Sample
(size n)

WORKED EXAMPLE 17 Identifying problems with collecting data on populations

List some problems you might encounter in trying to collect


data on the population of possums in a local area.

THINK WRITE
Consider how the data might be collected Firstly, it would be almost impossible to find all of the possums
and the problems in obtaining these data. in a local area in order to count them.
Secondly, possums are likely to stray into neighbouring areas,
making it impossible to know if they belong to the area being
observed.
Thirdly, possums are more active at night-time, making it
harder to detect their presence.

12.7.2 Samples
eles-4947
• Surveys are conducted using samples. Ideally the sample should reveal generalisations about
the population.
• The sample selected to be surveyed should be chosen without bias, as this may result in a sample that is not
representative of the whole population.
918 Jacaranda Maths Quest 10
For example, the students conducting the investigation decide to choose a sample of 12 fellow students.
Although it would be simplest to choose 12 of their friends as the sample, this would introduce bias since
they would not be representative of the population as a whole.
• A random sample is generally accepted as being an ideal representation of the population from which it
was drawn. However, it must be remembered that different random samples from the same population can
produce different results.
This means that we must be cautious about making predictions about a population, as results of surveys
conducted using random samples may vary. √
• A sample size must be sufficiently large. As a general rule, the sample size should be about N, where N is
the size of the population.
• If the sample size is too small, the conclusions that are drawn from the sample data may not reflect the
population as a whole.

WORKED EXAMPLE 18 Determining statistics from samples

A die was rolled 50 times and the following results were obtained.
6, 5, 3, 1, 6, 2, 3, 6, 2, 5, 3, 4, 1, 3, 2, 6, 4, 5, 5, 4, 3, 1, 2, 1, 6,
4, 5, 2, 3, 6, 1, 5, 3, 3, 2, 4, 1, 4, 2, 3, 2, 6, 3, 4, 6, 2, 1, 2, 4, 2

50 ≈ 7.1 .
a. Determine the mean of the population (to 1 decimal place).
( )

b. A suitable sample size for this population would be 7
i. Select a random sample of 7 scores, and determine the mean of these scores.
ii. Select a second random sample of 7 scores, and determine the mean of these.
iii. Select a third random sample of 20 scores, and determine the mean of these.
c. Comment on your answers to parts a and b.

∑x
THINK WRITE

a. Calculate the mean by first finding the a. Population mean =


n
=
sum of all the scores, then dividing by
the number of scores (50). 169

= 3.4
50

b. i. Use a calculator to randomly b. i. The 7 scores randomly selected are numbers 17, 50,
generate 7 scores from 1 to 50. 40, 34, 48, 12, 19 in the set of 50 scores.
Relate these numbers back to the These correspond to the scores:
scores, then calculate the mean. 4, 2, 3, 3, 2, 4, 5.

The mean of these scores = ≈ 3.3.


23
7
ii. Repeat b i to obtain a second set of ii. Ignore the second and third attempts to select 7
7 randomly selected scores. random numbers because of repeated numbers. The
This second set of random second set of 7 scores randomly selected is numbers
numbers produced the number 1 16, 49, 2, 42, 31, 11, 50 of the set of 50. These
twice. Try again. Another attempt correspond to the scores:
produced the number 14 twice. 6, 4, 5, 6, 1, 3, 2.

The mean of these scores = ≈ 3.9.


Try again. 27
A third attempt produced 7 7
different numbers. This set of 7
random numbers will then be used
to, again, calculate the mean of
the scores.

TOPIC 12 Univariate data 919


iii. Repeat for a randomly selected iii. The set of 20 randomly selected numbers produced a

Mean of 20 random scores = = 3.4


20 scores. total of 68.
68
20
c. Comment on the results. c. The population mean is 3.4.
The means of the two samples of 7 are 3.3 and 3.9. This
shows that, even though the samples are randomly selected,
their calculated means may be different.
The mean of the sample of 20 scores is 3.4.
This indicates that by using a bigger sample the result
is more accurate than those obtained with the smaller
samples.
TI | THINK DISPLAY/WRITE CASIO | THINK DISPLAY/WRITE
a. a. a. a.
1. In a new document, 1. On the Statistics
on a Lists & screen, label list1 as
Spreadsheet page, ‘die’. Enter the data
label column A as from the question.
‘die’. Enter the data Press EXE after each
from the question. value.

2. Although you can 2. To find the statistics


find many summary summary, tap:
statistics, to find the • Calc
mean only, open a • One-Variable
Calculator page and Set:
press: • XList: main\die
• MENU • Freq: 1
• 6: Statistics Tap OK.
• 3: List Math
• 3: Mean The mean of the 50 die rolls
Press VAR and select is 3.38.
‘die’, then press
CTRL ENTER The mean of the
to get a decimal 50 die rolls is 3.38.
approximation.

920 Jacaranda Maths Quest 10


b. b. b. b.
i–ii. To generate a i–ii. i–ii. To generate a random i-ii.
random sample of sample of 7 dice rolls,
7 scores, on the on the Main screen,
Calculator page, type the entry line as:
press: randList (7,1,6)
• MENU Then press EXE.
• 5: Probability The randList can be
• 4: Random found in the Catalog
• 2: Integer on the Keyboard.
The mean of the first sample of 7
Type the entry line rolls is 3.71, and the mean of the This randomly
as: second sample of 7 rolls is 3.29, generates 7 numbers
randInt (1,6,7) correct to 2 decimal places. between 1 and 6.
Then press ENTER. The mean of the first
Tap:
This randomly
• Action sample of 7 rolls is 4.71;
generates 7 numbers
• List the mean of the second
between 1 and 6.
• Statistics sample of 7 rolls is also
Press:
• Mean 4.71, correct to 2 decimal
• MENU
Highlight the random places.
• 6: Statistics
• 3: List Math list, including the
• 3: Mean brackets, and drag it
to the ‘mean( ’. Close
Complete the entry
the bracket and press
line as:
EXE.
mean(ans)
Repeat this with
Then press
another set of random
CTRL ENTER
numbers.
to get a decimal
approximation of the
mean.
Repeat this with
another set of
random numbers.

iii. To repeat the iii. To repeat the iii.


procedure with 20 procedure with 20
randomly generated randomly generated
values, change the values, change the first
first entry line to: entry line to:
randList(1,6,20) randList(20,1,6)
Follow the remaining Follow the remaining
steps to calculate the steps to calculate the
mean. mean.
The mean of the third sample of
20 rolls is 3.15.

The mean of the third


sample of 20 rolls is 2.65.
c. c. This indicates that the results c. c. This indicates that the
obtained from a bigger sample results obtained from a
are more accurate than those bigger sample are more
from smaller samples. accurate than those from
smaller samples.

TOPIC 12 Univariate data 921


12.7.3 To sample or to conduct a census?
eles-4948
• The particular circumstances determine whether data are collected from a population, or from a sample of
the population.
For example, suppose you collected data on the height of every Year 10 student in your class. If your
class was the only Year 10 class in the school, your class would be the population. If, however, there were
several Year 10 classes in your school, your class would be a sample of the Year 10 population.
• Worked example 18 showed that different random samples can produce different results. For this reason,
it is important to acknowledge that there could be some uncertainty when using sample results to make
predictions about the population.

WORKED EXAMPLE 19 Stating if the information was obtained by census or survey

For each of the following situations, state whether the information was obtained by census or survey.
Justify why that particular method was used.
a. A roll call is conducted each morning at school to determine which students are absent.
b. TV ratings are collected from a selection of viewers to discover the popular TV shows.
c. Every hundredth light bulb off an assembly production line is tested to determine the life of
that type of light bulb.
d. A teacher records the examination results of her class.

THINK WRITE
a. Every student is recorded as being present a. This is a census. If the roll call only applied to
or absent at the roll call. a sample of the students, there would not be an
accurate record of attendance at school.
A census is essential in this case.
b. Only a selection of the TV audience b. This is a survey. To collect data from the whole
contributed to these data. viewer population would be time-consuming and
expensive. For this reason, it is appropriate to select
a sample to conduct the survey.
c. Only 1 bulb in every 100 is tested. c. This is a survey. Light bulbs are tested to
destruction (burn-out) to determine their life. If
every bulb was tested in this way, there would be
none left to sell! A survey on a sample is essential.
d. Every student’s result is recorded. d. This is a census. It is essential to record the result
of every student.

Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Individual pathway interactivity: Populations and samples (int-4629)
Sample sizes (int-6183)

922 Jacaranda Maths Quest 10


Exercise 12.7 Populations and samples
12.7 Quick quiz 12.7 Exercise

Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 4, 5, 8, 11 2, 6, 9, 12 3, 7, 10, 13

Fluency
1. WE17 List some of the problems you might encounter in trying to collect

data from the following populations.


a. The life of a laptop computer battery
b. The number of dogs in your neighbourhood
c. The number of fish for sale at the fish markets
d. The average number of pieces of popcorn in a bag of popcorn
2. WE18 The data below show the results of the rolled die from Worked example 18.
6, 5, 3, 1, 6, 2, 3, 6, 2, 5, 3, 4, 1, 3, 2, 6, 4, 5, 5, 4, 3, 1, 2, 1, 6,
4, 5, 2, 3, 6, 1, 5, 3, 3, 2, 4, 1, 4, 2, 3, 2, 6, 3, 4, 6, 2, 1, 2, 4, 2
The mean of the population is 3.4. Select your own samples for the following questions.
a. Select a random sample of 7 scores, and determine the mean of these scores.
b. Select a second random sample of 7 scores, and determine the mean of these.
c. Select a third random sample of 20 scores, and determine the mean of these.
d. Comment on your answers to parts a, b and c.
3. WE19 In each of the following scenarios, state whether the information was obtained by census or survey.

Justify why that particular method was used.


a. Seating for all passengers is recorded for each aeroplane flight.
b. Movie ratings are collected from a selection of viewers to discover the best movies for the week.
c. Every hundredth soft drink bottle off an assembly production line is measured to determine the volume of
its contents.
d. A car driving instructor records the number of hours each learner driver has spent driving.

4. For each of the following, state whether a census or a survey has been used.
a. Two hundred people in a shopping centre are asked to nominate the supermarket where they do most of
their grocery shopping.
b. To find the most popular new car on the road, new car buyers are asked what make and model
they purchased.
c. To find the most popular new car on the road, data are obtained from the transport department.
d. Your Year 10 Maths class completed a series of questions on the amount of maths homework for
Year 10 students.

Understanding
5. To conduct a statistical investigation, Gloria needs to obtain information from 630 students.
a. Determine the appropriate sample size.
b. Describe a method of generating a set of random numbers for this sample.

TOPIC 12 Univariate data 923


6. A local council wants the opinions of its residents regarding
its endeavours to establish a new sporting facility for the
community. It has specifically requested all residents over 10
years of age to respond to a set of on-line questions.
a. State if this is a census or a survey.
b. Determine what problems could be encountered when
collecting data this way.
7. A poll was conducted at a school a few days before the
election for Head Boy and Head Girl. After the election, it was
discovered that the polls were completely misleading.
Explain how this could have happened.

Reasoning
8. A sampling error is said to occur when results of a sample are different from those of the population from
which the sample was drawn. Discuss some factors which could introduce sampling errors.
9. Since 1961, a census has been conducted in Australia every 5 years. Some people object to the census on the
basis that their privacy is being invaded. Others say that the expense involved could be directed to a better
cause. Others say that a sample could obtain statistics which are just as accurate.
State your views on this. Justify your statements.
10. Australia has a very small population compared with other countries such as China and India. These are the
world’s most populous nations, so the problems we encounter in conducting a census in Australia would be
insignificant compared with those encountered in those countries.
Suggest what different problems authorities would come across when conducting a census in countries with
large populations.

Problem solving
11. The game of Lotto involves picking the same 6 numbers in the range 1 to 45 as have been randomly selected
by a machine containing 45 numbered balls. The balls are mixed thoroughly, then 8 balls are selected
representing the 6 main numbers, plus 2 extra numbers, called supplementary numbers.
Here is a list of the number of times each number had been drawn over a period of time, and also the number
of weeks since each particular number has been drawn.

Number of weeks since Number of times each number


each number drawn drawn since draw 413
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
1 5 2 1 1 7 - 4 246 238 244 227 249 241 253 266
9 10 11 12 13 14 15 16 9 10 11 12 13 14 15 16
3 3 1 5 5 7 - 4 228 213 250 233 224 221 240 223
17 18 19 20 21 22 23 24 17 18 19 20 21 22 23 24
9 - 9 2 2 12 10 8 217 233 240 226 238 240 253 228
25 26 27 28 29 30 31 32 25 26 27 28 29 30 31 32
5 11 17 2 3 3 - 22 252 239 198 229 227 204 230 226
33 34 35 36 37 38 39 40 33 34 35 36 37 38 39 40
4 3 - 1 12 - 6 - 246 233 232 251 222 221 219 259
41 42 43 44 45 41 42 43 44 45
6 1 7 - 31 245 242 237 221 224

If these numbers are randomly chosen, explain the differences shown in the tables.

924 Jacaranda Maths Quest 10


12. A sample of 30 people was selected at random from those attending a
local swimming pool. Their ages (in years) were recorded as follows.
19, 7, 58, 41, 17, 23,
62, 55, 40, 37, 32, 29,
21, 18, 16, 10, 40, 36,
33, 59, 65, 68, 15, 9,
20, 29, 38, 24, 10, 30

a. Calculate the mean and the median age of the people in this sample.
b. Group the data into class intervals (0–9 etc.) and complete the
frequency distribution table.
c. Use the frequency distribution table to estimate the mean age.
d. Calculate the cumulative frequency and hence plot the ogive.
e. Estimate the median age from the ogive.
f. Compare the mean and median of the original data in part a with the
estimates of the mean and the median obtained for the grouped data in
parts c and e.
g. Were the estimates good enough? Explain your answer.

13. The typing speed (words per minute) was recorded for a group of Year 8 and Year 10 students. The results
are displayed in this back-to-back stem plot.
Key: 2|6 = 26 wpm
Leaf: Year 8 Stem Leaf: Year 10
99 0
9865420 1 79
988642100 2 23689
9776410 3 02455788
86520 4 1258899
5 03578
6 003
Write a report comparing the typing speeds of the two groups.

TOPIC 12 Univariate data 925


LESSON
12.8 Evaluating inquiry methods and statistical reports
LEARNING INTENTIONS
At the end of this lesson you should be able to:
• describe the difference between primary and secondary data collection
• identify misleading errors or graphical techniques in data
• evaluate the accuracy of a statistical report.

12.8.1 Data collection methods


eles-4963
• Data can be collected in different ways. The manner in which you collect data can affect the validity of the
results you determine.
• Primary data is collected firsthand by observation, measurement, survey, experiment or simulation. and is
owned by the data collector until it is published.
• Secondary data is obtained from external sources such as journals, newspapers, the internet, or any other
previously collected data.
When collecting primary data through experiments, or using secondary data, considerations need to be given to:
• Validity. Results of valid investigations are supported by other investigations.
• Reliability. Data is reliable if the same data can be collected when the investigation is repeated. If the data
cannot be repeated by other investigators, the data may not be valid or true.
• Accuracy and precision. When collecting primary data through experiments, the accuracy of the data is
how close it is to a known value. Precise data is when multiple measurements of the same investigation are
close to each other.
• Bias. When collecting primary data in surveys it is important to ensure your sample is representative of the
population you are studying. When using secondary data you should consider who collected the data, and
whether those researchers had any intentional or unintentional influence on the data.

WORKED EXAMPLE 20 Choosing appropriate collection methods

You have been given an assignment to investigate which


year level uses the school library, after school, the most.
a. Explain whether it is more appropriate to use primary
or secondary data in this case. Justify your choice.
b. Describe how the data could be collected. Discuss any
problems which might be encountered.

THINK WRITE
a. No records have been kept on library use. a. Since records are not kept on the library use,
secondary data is not an option.
Therefore, primary data is more appropriate to use
in this case.

926 Jacaranda Maths Quest 10


b. The data can be collected via a questionnaire b. A questionnaire could be designed and distributed
or in person. to a randomly-chosen sample. The problem here
would be the non-return of the forms.
Observation could be used to personally interview
students as they entered the library. This would take
more time, but random interview times could
be selected.

WORKED EXAMPLE 21 Choosing appropriate collection methods

State which method would be the most appropriate to collect the following data. Suggest an
alternative method in each case.
a. The number of cars parked in the staff car park each day.
b. The mass of books students carry to school each day.
c. The length a spring stretches when weights are added to it.
d. The cost of mobile phone plans with various network providers.

THINK WRITE
a. Observation a. The best way would probably be observation by visiting the staff car
park to count the number of cars there.
An alternative method would be to conduct a census of all workers to ask
if they parked in the staff car park. This method may be prone to errors
as it relies on accurate reporting by many people.
b. Measurement b. The mass of the books could be measured by weighing each student’s
pack on scales.
A random sample would probably yield a reasonably accurate result.
c. Experiment c. Conduct an experiment and measure the extension of the spring with
various weights.
There are probably no alternatives to this method as results will depend
upon the type of spring used.
d. Internet search d. An internet search would enable data to be collected.
Alternatively, a visit to mobile phone outlets would yield similar results.

12.8.2 Analysing statistical graphs


eles-4964
• Data can be graphed in a variety of ways — line graphs, bar graphs, histograms, stem plots, box plots, and
so on.
• Because graphs give a quick visual impression, the temptation is to not look at them in great detail. Often
graphs can be quite misleading.
• It is easy to manipulate a graph to give an impression which is supported by the creator of the graph. This is
achieved by careful choice of scale on the horizontal and vertical axes.
• Shortening the horizontal axis tends to highlight the increasing/decreasing nature of the trend of the
graph. Lengthening the vertical axis tends to have the same effect.
• Lengthening the horizontal and shortening the vertical axes tends to level out the trends.

TOPIC 12 Univariate data 927


WORKED EXAMPLE 22 Observing the effect of changing scales on bar graphs

The report shows the annual HOUSES Median house price Annual
change in median house prices Suburb/locality 2020–21 2019–20 change
in the local government areas
(LGA) of Queensland from Brisbane (LGA) $700,000 $627,000 11.6%
2019–20 to 2020–21. Ipswich City (LGA) $323,000 $310,000 4.2%
a. Draw a bar graph which would
Redland City (LGA) $467,500 $435,000 7.5%
give the impression that the
percentage annual change was Logan City (LGA) $360,000 $340,000 5.9%
much the same throughout the Moreton Bay (LGA) $399,000 $372,000 7.3%
whole state. Gold Coast City (LGA) $505,000 $465,000 8.6%
b. Construct a bar graph to
Toowoomba (LGA) $360,000 $334,500 7.6%
give the impression that the
percentage annual change Sunshine Coast (LGA) $470,000 $445,000 5.6%
in Brisbane was far greater Fraser Coast (LGA) $322,500 $312,500 3.2%
than that in the other local
Bundaberg (LGA) $282,000 $275,000 2.5%
government areas.
Gladstone (LGA) $286,000 $286,000 0.0%
Rockhampton (LGA) $267,000 $254,000 5.1%
Mackay (LGA) $398,000 $383,000 3.9%
Townsville City (LGA) $375,000 $359,000 4.5%
Cairns (LGA) $400,000 $389,000 2.8%

THINK WRITE/DRAW
a. To flatten out trends, a. % house price changes in QLD 2019–20 to 2020-21
lengthen the horizontal
axis and shorten the
vertical axis.
Annual % change

10

0
Brisbane

Ipswich

Redland

Logan

Moreton Bay

Gold Coast

Toowoomba

Sunshine Coast

Fraser Coast

Bundaberg

Gladstone

Rockhampton

Mackay

Townsville

Cairns

Area

928 Jacaranda Maths Quest 10


b. To accentuate trends, b. % house price changes in QLD 2019–20 to 2020–21
shorten the horizontal 12
axis and lengthen the 11
vertical axis.
10
9

Annual % change
8
7
6
5
4
3
2
0
Brisbane
Ipswich
Redland
Logan
Moreton Bay
Gold Coast
Toowoomba
Sunshine Coast
Fraser Coast
Bundaberg
Gladstone
Rockhampton
Mackay
Townsville
Cairns
Area

WORKED EXAMPLE 23 Comparing and choosing statistical measures

Consider the data displayed in the table of Worked example 22. Use the data collected for the median
house prices in 2020–21.
a. Explain whether the data would be classed as primary or secondary data.
b. Explain why the data shows median house prices rather than the mean or modal house price.
c. Calculate a measure of central tendency for the data. Explain the reason for this choice.
d. Give a measure of spread of the data, giving a reason for the particular choice.
e. Display the data in a graphical form, explaining why this particular form was chosen.

THINK WRITE
a. These are data that have been a. These are secondary data because they have been
collected by someone else. collected by someone else.
b. Median is the middle price, mean is b. The median price is the middle value. It is not affected
the average price, and mode is the by outliers as the mean is. The modal house price may
most frequently-occurring price. only occur for two house sales with the same value.
On the other hand, there may not be any mode.
The median price is the most appropriate in this case.

TOPIC 12 Univariate data 929


c. Determine the measure of central c. The measures of central tendency are the mean,

values (i.e. $700 000) and low values (i.e. $282 000).
tendency that is the most appropriate median and mode. The mean is affected by high
one.
These are not typical values, so the mean would not
be appropriate.
There is no modal value, as all the house prices
are different.
The median house price is the most suitable measure
of central tendency to represent the house prices in

value is $370 000.


the Queensland local government areas. The median

Lowest score = $267 000


d. Consider the range and the d. The five-number summary values are:

Lowest quartile = $322 500


interquartile range as measures of

Median = $375 000


spread.

Upper quartile = $467 500


Highest score = $700 000
Range = $700 000 − $267 000
= $433 000
Interquartile range = $467 500 − $322 500
= $145 000
The interquartile range is a better measure for the
range as the house prices form a cluster in this region.
e. Consider the graphing options. e. Of all the graphing options, the box plot seems the
most appropriate as it shows the spread of the prices
as well as how they are grouped around the median
price.

200 000 300000 400000 500000 600000 700000


Median house price 2020–21 ($)

WORKED EXAMPLE 24 Analysing data and use of statistics to interpret results

The following data is the heights of the members of the Australian women’s national basketball team
(in metres):
1.73, 1.65, 1.8, 1.83, 1.96, 1.88, 1.63, 1.88, 1.83, 1.88, 1.8, 1.96
Provide calculations and explanations as evidence to verify or refute the following statements.
a. The mean height of the team is greater than their median height.
b. The range of the heights of the 12 players is almost 3 times their interquartile range.
c. Only 5 players are on the court at any one time. A team of 5 players can be chosen such that their
mean, median and modal heights are all the same.

∑x
THINK WRITE

a. Mean = = = 1.82 m
21.83
a. 1. Calculate the mean height of the 12 players.
n 12

930 Jacaranda Maths Quest 10


2. Order the heights to determine the median. The heights of the players, in order, are:
1.63, 1.65, 1.73, 1.8, 1.8, 1.83, 1.83, 1.88, 1.88,
1.88, 1.96, 1.96. There are 12 scores, so the

1.83 + 1.83
median is the average of the 6th and 7th scores.

Median = = 1.83 m
2
3. Comment on the statement. The mean is 1.82 m, while the median is
1.83 m. This means that the mean is less than

b. Range = 1.96 − 1.63 = 0.33 m


the median, so the statement is not true.
b. 1. Determine the range and the interquartile
range of the 12 heights. The lower quartile is the average of the 3rd and

1.73 + 1.8
Lower quartile = = 1.765 m
4th scores.

2
The upper quartile is average of the 3rd and

1.88 + 1.88
Upper quartile = = 1.88 m
4th scores from the end.

Interquartile range = 1.88 − 1.765 = 0.115 m


2

2. Compare the two values. Range = 0.33 m


Interquartile range = 0.115 m

= = 2.9
Range 0.33

Range = 2.9 × interquartile range


Interquartile range 0.115
3. Comment on the statement.
This is almost 3 times, so the statement is true.
c. 1. Choose 5 players whose mean, median and c. Three players have a height of 1.88 m. If a
modal heights are all equal. Trial and error is player shorter and one taller are chosen, both
appropriate here. There may be more than the same measurement from 1.88 m, this would
one answer. make the mean, median and mode all the same.
Choose players with heights:
1.8, 1.88, 1.88, 1.88, 1.96

Mean = = 1.88 m
9.4

Median = 3rd score = 1.88 m


5

Mode = most frequent score = 1.88 m


2. Comment on the statement. The 5 players with heights, 1.8 m, 1.88 m,
1.88 m, 1.88 m and 1.96 m have a mean,
median and modal height of 1.88 m.
It is true that a team of 5 such players can
be chosen.

TOPIC 12 Univariate data 931


TI | THINK DISPLAY/WRITE CASIO | THINK DISPLAY/WRITE
a. a. a. a.
1. In a new problem, on 1. On the Statistics screen,
a Lists & Spreadsheet label list1 as ‘heights’.
page, label column A as Enter the data in
‘heights’. Enter the data the table as shown.
from the question. Press EXE after each
value.

2. Open a Calculator page 2. To find the statistics


and complete the entry summary, tap:
lines as: • Calc
mean(heights) • One-Variable
median(heights) Set values as:
Press ENTER after each • XList: main\heights
entry. • Freq: 1
Tap OK.
The mean and median can
The mean heights are less than be found in the list.
the median heights, so the
statement is false.
The mean heights are less
than the median heights, so
the statement is false.
b. b. b. b.
To find all the summary More statistics can be
statistics, open the found from the statistics
Calculator page and summary.
press:
• MENU
• 6: Statistics
• 1: Stat Calculations
• 1: One-Variable

The range is max − min =


Statistics...

1.96 − 1.63 = 0.33.


Select 1 as the number

Q1 = 1.765 and Q3 = 1.88.


of lists. Then on the

IQR = Q3 − Q1 The range is max − min


One-Variable Statistics

= 1.88 − 1.765 = 0.115. = 1.96 − 1.63 = 0.33.


page, select ‘heights’ as

Now 2.9 × IQR ≈ range so the Q1 = 1.765 and Q3 = 1.88.


the X1 List and leave the

IQR = Q3 − Q1
frequency as 1. Leave the

= 1.88 − 1.765 = 0.115.


next two fields empty, statement is true.

Now 2.9 × IQR ≈ range, so


TAB to OK and press
ENTER.
the statement is true.

12.8.3 Statistical reports


eles-4955
• Reported data must not be simply taken at face value; all reports should be examined with a critical eye.

WORKED EXAMPLE 25 Analysing a statistical report

This is an excerpt from an article that appeared in a newspaper on Father’s Day. It was reported to be
national survey findings of a Gallup Poll of data from 1255 fathers of children aged 17 and under.

932 Jacaranda Maths Quest 10


THE GREAT AUSSIE DADS SURVEY

Thinking about all aspects of your life, how Which of these aspects of your children’s future
happy would you say you are? do you have concerns about?
% %
I am very happy......................................................26 Their safety.............................................................70
I am fairly happy.....................................................49 Being exposed to drugs.........................................67
Totally happy..........................................................75 Their health.............................................................54
Some days I’m happy and some days Bullying or cyber-bullying.......................................50
I’m not....................................................................21 Teenage violence....................................................50
I am fairly unhappy...................................................3 Their ability to afford a home..................................50
I am very unhappy....................................................1 Alcohol consumption and binge drinking...............47
Totally unhappy........................................................4 Achieving academic success.................................47
How often, if ever, do you regret having children? Achieving academic success.................................47
Every day..................................................................1 Feeling pressured into sex.....................................41
Most days.................................................................2 Being able to afford the lifestyle they expect to
Some days.............................................................18 have........................................................................38
Never......................................................................79 Climate change......................................................23
Having them living with you in their mid 20s..........14
Which one of these best describes the impact None of the above....................................................3
of having children on your relationship with your
partner? What is the best thing about being a dad?
We’re closer than ever............................................29 The simple pleasures of family life........................61
We don’t spend as much time together as we Enjoying the successes of your kids......................24
should.....................................................................40 The unpredictability it brings....................................9
We’re more like friends now than lovers...............21 The comfort of knowing that you will be looked after
We have drifted apart...............................................6 in later life.................................................................3
None of the above....................................................4 None of the above....................................................3
Which one of these best describes the Key findings
allocation of cooking and cleaning duties in your 75% of Aussie dads are totally happy
household? 79% have never regretted having children
My partner does nothing/I do everything.................1 67% are worried about their children being
I do most of it.........................................................11 exposed to drugs
We share the cooking and cleaning.......................42 57% would like more intimacy with their partner
My partner does most of it.....................................41 “Work–life balance is definitely an issue for dads
I do nothing/my partner does everything.................4 in 2010.”
None of the above....................................................1 David Briggs
Galaxy principal

Source: The Sunday Mail, 5 Sept. 2010, pp. 14–15.

a. Comment on the sample chosen.


b. Discuss the percentages displayed.
c. Comment on the claim that 57% of dads would like more intimacy with their partner.

THINK WRITE
a. How is the sample chosen? Is it truly a. The results of a national survey such as this should
representative of the population of reveal the outlook of the whole nation’s dads. There
Australian dads? is no indication of how the sample was chosen, so
without further knowledge we tend to accept that
it is representative of the population. A sample of
1255 is probably large enough.

TOPIC 12 Univariate data 933


b. Look at the percentages in each of the b. For the first question regarding happiness, the
categories. percentages total more than 100%. It seems logical
that, in a question such as this, the respondents
would tick only one box, but obviously this has not
been the case.
In the question regarding aspects of concern of
‘your children’s future’, these percentages also
total more than 100%. It seems appropriate here that
dads would have more than one concerning area, so
it is possible for the percentages to total more
than 100%.
In each of the other three questions, the percentages
total 100%, which is appropriate.
c. Look at the tables to try to find the source c. Examining the reported percentages in the question
of this figure. regarding ‘relationship with your partner’, there is
no indication how a figure of 57% was determined.
Note: Frequently media reports make claims where the reader cannot confirm their truth.

WORKED EXAMPLE 26 Analysing a statistical report

This article appeared in a newspaper. Read the article, then answer the following questions.
SPONGES ARE TOXIC
Washing dishes can pose a serious health risk, with more than half of all kitchen sponges containing
high levels of dangerous bacteria, research shows.

A new survey dishing the dirt on washing up shows more than 50 per cent of kitchen sponges have
high levels of E. coli, which can cause severe cramps and diarrhoea, and staphylococcus aureus, which
releases toxins that can lead to food poisoning or toxic shock syndrome.

Microbiologist Craig Andrew-Kabilafkas of Australian Food Microbiology said the Westinghouse study of
more than 1000 households revealed germs can spread easily to freshly washed dishes.

The only way to safeguard homes from sickness was to wash utensils at very high temperatures in a
dishwasher.

Source: The Sunday Mail, 5 Sept. 2010, p. 36.

a. Comment on the sample used in this survey.


b. Comment on the claims of the survey and identify any potential bias.
c. Is the heading of the article appropriate?

THINK WRITE
a. Look at sample size and selection of sample. a. The report claims that the sample size was
more than 1000. There is no indication how
the sample was selected.
The point to keep in mind is whether this
sample is truly representative of the population
consisting of all households. We have no way
of knowing.

934 Jacaranda Maths Quest 10


b. 1. Determine the results of the survey. b. The survey claims that 50% of kitchen sponges
have high levels of E. coli, which can cause
severe medical problems.
2. Identify any potential bias. The study was conducted by Westinghouse,
so it is not surprising they recommend using
a dishwasher. There is no detail of how the
sample was selected.
c. Examine the heading in the light of the contents c. The heading is quite shocking, designed to
of the article. catch the attention of readers.

COLLABORATIVE TASK: Secondary data


Use the internet to source secondary data in an article that predicts the number of people likely to be infected with COVID-19.
Reflect on the validity of the claims, consider the sample size used and discuss the ethical considerations of reporting this data
to the wider public.

Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Individual pathway interactivity: Evaluating inquiry methods and statistical reports (int-4631)
Compare statistical reports (int-2790)

Exercise 12.8 Evaluating inquiry methods and


statistical reports
12.8 Quick quiz 12.8 Exercise

Individual pathways
PRACTISE CONSOLIDATE MASTER
1, 4, 6, 9, 12 2, 7, 10, 13 3, 5, 8, 11, 14

Fluency
1. WE20,21 You have been given an assignment to investigate which Year level has the greatest number of

students who are driven to school each day by car.


a. Explain whether it is more appropriate to use primary or secondary data in this case. Justify your choice.
b. Describe how the data could be collected. Discuss any problems which might be encountered.
c. Explain whether an alternative method would be just as appropriate.

TOPIC 12 Univariate data 935


2. WE22 You run a small company that is listed on the Australian Stock Exchange (ASX). During the past year

you have given substantial rises in salary to all your staff. However, profits have not been as spectacular as in
the year before. This table gives the figures for the salary and profits for each quarter.
1st quarter 2nd quarter 3rd quarter 4th quarter

Profits ($ 000 000) 6 5.9 6 6.5
Salaries ($′ 000 000) 4 5 6 7

Draw two graphs, one showing profits, the other showing salaries, which will show you in the best possible
light to your shareholders.
3. WE23 The data below were collected from a real estate agent and show the sale prices of ten blocks of land

$150 000, $190 000, $175 000, $150 000, $650 000, $150 000, $165 000, $180 000, $160 000, $180 000
in a new estate.

a. Calculate a measure of central tendency for the data. Explain the reason for this choice.
b. Give a measure of spread of the data, giving a reason for the particular choice.
c. Display the data in a graphical form, explaining why this particular form was chosen.

Own one of these amazing blocks of land for only $150 000 (average)!
d. The real estate agent advertises the new estate land as:

Comment on the agent’s claims.


4. WE24 The following data are the heights of the members of the Australian women’s national basketball team

(in metres):
1.73, 1.65, 1.8, 1.83, 1.96, 1.88, 1.63, 1.88, 1.83, 1.88, 1.8, 1.96
Provide calculations and explanations as evidence to verify or refute the following statements.
a. The mean height of the team is closer to the lower quartile than it is to the median.
b. Half the players have a height within the interquartile range.
c. Suggest which 5 players could be chosen to have the minimum range in heights.

5. The resting pulse of 20 female athletes was measured and is shown below.
50 62 48 52 71 61 30 45 42 48 43 47 51 52 34 61 44 54 38 40
a. Represent the data in a distribution table using appropriate groupings.
b. Calculate the mean, median and mode of the data.
c. Comment on the similarities and differences between the three values.

6. The batting scores for two cricket players over six innings were recorded as follows.
Player A: 31, 34, 42, 28, 30, 41
Player B: 0, 0, 1, 0, 250, 0
Player B was hailed as a hero for his score of 250.
Comment on the performance of the two players.

936 Jacaranda Maths Quest 10


Understanding
7. The table below shows the number of shoes of each size that were sold over a week at a shoe store.

Number
Size sold
4 5
5 7
6 19
7 24
8 16
9 8
10 7

a. Calculate the mean shoe size sold.


b. Determine the median shoe size sold.
c. Identify the modal shoe size sold.
d. Explain which measure of central tendency has the most meaning to the store proprietor.
8. WE25&26 This report from Woolworths appeared in a newspaper.

IT’S A RECORD
• Woolworths posted 10.1% gain in annual profit to $2.02b
• 11th consecutive year of double-digit growth
• Flags 8% to 11% growth in the current financial year
• Sales rose 4.8% to $51.2b
• Wants to increase its share of the fresh food market
• Announced $700m off-market share buyback
• Final fully franked dividend 62% a share

Shares rebound Net profit


$
28.40
28.10 2.4% $b +$2.02b
27.70 Yesterday +10.1%
2 +12.8%
27.40 +25.7%
27.10 1.5 +27.5%
20.80
+24.3%
26.50 1
26.20
0.5
25.90
25.60
Aug 26 0
May 26 2006 2007 2008 2009 2010
Source: IRESS

Source: The Courier Mail, 27 Aug. 2010, pp. 40–1.

Comment on the report.

TOPIC 12 Univariate data 937


Reasoning
9. Explain the point of drawing a misleading graph in a report.

10. The graph shows the fluctuation in the Australian dollar in terms of the US AUSSIE
dollar during the period 13 July to 13 September 2010. US¢ US 93.29¢
The higher the Australian dollar, the cheaper it is for Australian companies to 92.8
import goods from overseas, and the cheaper they should be able to sell their
90.9
goods to the Australian public.
The manager of Company XYZ produced a graph to support his claim that, 88.8
because there hasn’t been much change in the Aussie dollar over that period, 86.8
there hasn’t been any change in the price he sells his imported goods to the 84.8
Australian public.
82.8
Draw a graph that would support his claim. Explain how you were able to
achieve this effect. 80.8
Jul 13 Sep 13
Source: IRESS
Source: The Courier Mail, 14
Sept. 2010, p. 25.

11. Two brands of light globes were tested by a consumer organisation. They obtained the following results.

Brand A (hours lasted) Brand B (hours lasted)


385, 390, 425, 426, 570, 500, 555, 560, 630, 720,
640, 645, 730, 735, 760 735, 742, 770, 820, 860
a. Complete a back-to-back stem plot for the data.
b. State the brand that had the shortest lifetime. Justify your answer.
c. State the brand that had the longest lifetime. Justify your answer.
d. If you wanted to be certain that a globe you bought would last at least 500 hours, determine which brand
would you buy. Show your working.

Problem solving
12. A small manufacturing plant employs 80 workers. The table below shows the structure of the plant.

Position Salary ($) Number of employees


Machine operator 18 000 50
Machine mechanic 20 000 15
Floor steward 24 000 10
Manager 62 000 4
Chief Executive Officer 80 000 1

because the mean salary of the factory is $22 100.


a. Workers are arguing for a pay rise, but the management of the factory claims that workers are well paid

Explain whether this is a sound argument.


b. Suppose that you were representing the factory workers and had to write a short submission in support of
the pay rise.
Explain the management’s claim by providing some other statistics to support your case.

938 Jacaranda Maths Quest 10


13. Look at the following bar charts and discuss why the one on the left is misleading and what characteristics
the one on the right possesses that makes it acceptable.
Massive increase in house Small increase in house
prices this year prices this year

Average house price 82 000 100 000


90 000

Average house price


80 000
in pounds

81 000 70 000

in pounds
60 000
50 000
40 000
80 000 30 000
20 000
10 000

1998 1999 1998 1999

14. a. Determine what is wrong with this pie graph.

2020 presidential run

Candidate A
70%

Candidate B
63%

Candidate C
60%

b. Explain why the following information is misleading.


Did scientists falsify research to support their own theories on global warming?
59% somewhat likely
35% very likely
26% not very likely
c. Discuss the implications of this falsification by statistics.

TOPIC 12 Univariate data 939


LESSON
12.9 Review
12.9.1 Topic summary
Populations and samples Measures of spread

• A population is the full set of people/things that you • Measures of spread describe how far the data values
are collecting data on. are spread from the centre or from each other.
• A sample is a subset of a population. • The range is the difference between the maximum and
• Samples must be randomly selected from a minimum data values.
population in order for the results of the sample to Range = maxiumum value – minimum value
accurately reflect the population.
• The interquartile range (IQR) is the range of the
• It is important to acknowledge that the results taken
middle 50% of the scores in an ordered set:
from a sample may not reflect the population as a
whole. This is particularly true if the sample size is IQR = Q3 – Q1
too small. where Q1 and Q3 are the first and third quartiles
• The minimum sample size that should be used is respectively.
approximately √ N, where N is the size of the
population.
Standard deviation
• The standard deviation is a more sophisticated measure
of spread.
UNIVARIATE DATA • The standard deviation measures how far, on average,
each data value is away from the mean.
• The deviation of a data value is the difference between it
Measures of central tendency and the mean (x – xˉ ).
• The formula for the standard deviation is:
• The three measures of central tendency are the mean,
median and mode. (x – xˉ )2
• The mean is the average of all values in a set of data. σ=
n
It is therefore affected by extreme values.
• The standard deviation is always a positive number.
x
x=
n
Box plots
• The median is the middle value in an ordered set of
n+1 The five-number summary of a data set is a list
data. It is located at the th score. containing:
2
• The mode is the most frequent value in a set of data. • the minimum value
• For the data set 2, 4, 5, 7, 7: • the lower quartile, Q1
• the median
The mean is 2 + 4 + 5 + 7 + 7 = 5. • the upper quartile, Q3
5 • the maximum value.
The median is the middle value, 5. Boxplots are graphs of the five-number summary.
The mode is 7.

Comparing data sets


• Measures of centre and measures of spread are used
to compare data sets. The lowest The The The The greatest
• It is important to consider which measure of centre or score lower median upper score
spread is the most relevant when comparing data sets. Xmin quartile M quartile Xmax
• For example, if there are outliers in a data set, the (Lower Q1 Q3 (Upper
extreme) extreme)
median will likely be a better measure of centre than
the mean, and the IQR would be a better measure of
spread than the range. • Outliers are calculated and marked on the box plot.
• Back-to-back stem-and-leaf plots can be used to A score is considered an outlier if it falls outside the
compare two data sets. upper or lower boundary.
• Parallel box plots can also be used to compare the Lower boundary = Q1 – 1.5 × IQR
spread of two or more data sets. Upper boundary = Q3 + 1.5 × IQR

940 Jacaranda Maths Quest 10


12.9.2 Success criteria
Tick the column to indicate that you have completed the lesson and how well you have understood it using the
traffic light system.
(Green: I understand; Yellow: I can do it with help; Red: I do not understand.)

Lesson Success criteria

12.2 I can calculate the mean, median and mode of data presented as
ungrouped data, frequency distribution tables and grouped data.

12.3 I can calculate the range and interquartile range of a data set.

12.4 I can calculate the five-number summary for a set of data.

I can draw a box plot showing the five-number summary of a data set.

I can calculate outliers in a data set.

I can describe the skewness of distributions.

I can compare box plots to dot plots or histograms.

I can draw parallel box plots and compare sets of data.

12.5 I can calculate the standard deviation of a small data set by hand.

I can calculate the standard deviation using technology.

I can interpret the mean and standard deviation of data.

I can identify the effect of outliers on the standard deviation.

12.6 I can choose an appropriate measure of centre and spread to


analyse data.
I can interpret and make decisions based on measures of centre
and spread.

12.7 I can describe the difference between populations and samples.

I can recognise the difference between a census and a survey and


identify a preferred method in different circumstances.
12.8 I can describe the difference between primary and secondary
data collection.

I can identify misleading errors or graphical techniques in data.

I can evaluate the accuracy of a statistical report.

TOPIC 12 Univariate data 941


12.9.3 Project
Cricket scores
Data are used to predict, analyse, compare and measure many
aspects of the game of cricket. Attendance is tallied at every
match. Players’ scores are analysed to see if they should be kept
on the team.
Comparisons of bowling and batting averages are used to select
winners for awards. Runs made, wickets taken, no-balls bowled,
the number of ducks scored in a game as well as the number of 4s
and 6s are all counted and analysed after the game.
Data of all sorts are gathered and recorded, and measures of
central tendency and spread are then calculated and interpreted.
Sets of data have been made available for you to analyse, and
decisions based on the resultant measures can be made.
Batting averages
The following table shows the runs scored by four cricketers who
are vying for selection to the state team.

Player Runs in the last 25 matches Mean Median Range IQR


Will 13, 18, 23, 21, 9, 12, 31, 21, 20, 18, 14, 16, 28,
17, 10, 14, 9, 23, 12, 24, 0, 18, 14, 14, 20
Rohit 2, 0, 112, 11, 0, 0, 8, 0, 10, 0, 56, 4, 8, 164, 6,
12, 2, 0, 5, 0, 0, 0, 8, 18, 0
Marnus 12, 0, 45, 23, 0, 8, 21, 32, 6, 0, 8, 14, 1, 27, 23,
43, 7, 45, 2, 32, 0, 6, 11, 21, 32
Ben 2, 0, 3, 12, 0, 2, 5, 8, 42, 0, 12, 8, 9, 17, 31, 28,
21, 42, 31, 24, 30, 22, 18, 20, 31

1. Calculate the mean, median, range and IQR scored for each cricketer.
2. You need to recommend the selection of two of the four cricketers. For each player, write two points as
to why you would or would not select them.
Use statistics in your comments.
Bowling averages
The bowling average is the number of runs per wicket taken

Bowling average =
number of runs scored
number of wicket taken
The smaller the average, the better the bowler has performed.
Josh and Ravi were competing for three bowling awards:
• Best in semifinal
• Best in final
• Best overall

942 Jacaranda Maths Quest 10


The following table gives their scores.

Semifinal Final
Runs scored Wickets taken Runs scored Wickets taken
Josh 12 5 28 6
Ravi 10 4 15 3

2. Calculate the bowling averages for the following and fill in the table below.
• Semifinal
• Final
• Overall

Semifinal average Final average Overall average


Josh
Ravi

3. Explain how Ravi can have the better overall average when Josh has the better average in both the
semifinal and final.

Resources
Resourceseses
eWorkbook Topic 12 Workbook (worksheets, code puzzle and project) (ewbk-13302)
Interactivities Crossword (int-2860)
Sudoku puzzle (int-3599)

Exercise 12.9 Review questions

Fluency
1. List some problems you might encounter in trying to collect data from the following populations.
a. The average number of mL in a can of soft drink
b. The number of fish in a dam
c. The number of workers who catch public transport to work each weekday morning

2. For each of the following investigations, state whether a census or a survey has been used.
a. The average price of petrol in Canberra was estimated by averaging the price at 30 petrol stations in
the area.
b. The performance of a cricketer is measured by looking at his performance in every match he
has played.
c. Public opinion on an issue is sought by a telephone poll of 2000 homes.

3. Consider the box plot shown.


a. Calculate the median.
b. Calculate the range.
0 5 10 15 20 25 30 35 40
c. Determine the interquartile range. Score

TOPIC 12 Univariate data 943


4. Calculate the mean, median and mode for each of the following sets of data:

b. Key: 1|2 = 12
a. 7, 15, 8, 8, 20, 14, 8, 10, 12, 6, 19

Stem Leaf
1 26
2 178
3 033468
4 01159
5 136
c.
Score (x) Frequency ( f )
70 2
71 6
72 9
73 7
74 4

5. For each of the following data sets, calculate the range.


a. 4, 3, 6, 7, 2, 5, 8, 4, 3
b. x 13 14 15 16 17 18 19
f 3 6 7 12 6 7 8

c. Key: 1|8 = 18
Stem Leaf
1 7889
2 12445777899
3 0001347

6. For each of the following data sets, calculate the interquartile range.
a. 18, 14, 15, 19, 20, 11 16, 19, 18, 19

b. Key: 9|8 = 9.8


Stem Leaf
8 7889
9 02445777899
10 01113

7. The following back-to-back stem-and-leaf plot shows


Key: 2|6 = 26 wpm
the typing speed in words per minute (wpm) of 30
Leaf: Year 8 Stem Leaf: Year 10
Year 8 and Year 10 students.
a. Calculate the mean, median, range, interquartile range 99 0
and standard deviation of the Year 8 data set. 9865420 1 79
b. Calculate the mean, median, range, interquartile range 988642100 2 23689
and standard deviation of the Year 10 data set. 9776410 3 02455788
c. Compare the two distributions, using your answers to 86520 4 1258899
parts a and b. 5 03578
d. Using a calculator or otherwise, construct a pair of 6 003
parallel box-and-whisker plots to represent the two
sets of data.

944 Jacaranda Maths Quest 10


8. The following data give the amount of cut meat (in kg) obtained from each of 20 lambs.

4.5 6.2 5.8 4.7 4.0 3.9 6.2 6.8 5.5 6.1
5.9 5.8 5.0 4.3 4.0 4.6 4.8 5.3 4.2 4.8

a. Detail the data on a stem-and-leaf plot. (Use a class size of 0.5 kg.)
b. Prepare a five-point summary of the data.
c. Draw a box plot of the data.

9. Calculate the standard deviation of each of the following data sets correct to 1 decimal place.
a. 58, 12, 98, 45, 60, 34, 42, 71, 90, 66
b.
x 1 2 3 4 5
f 2 6 12 8 5

c. Key: 1|4 = 14
Stem Leaf
0 1344578
1 00012245789
2 022357

10. MC The Millers obtained a number of quotes on the price of having their home painted. The quotes, to
the nearest hundred dollars, were:
4200, 5100, 4700, 4600, 4800, 5000, 4700, 4900
The standard deviation for this set of data, to the nearest whole dollar, is:
A. 260 B. 278 C. 324
D. 325 E. 900

11. MC The number of Year 12 students who, during semester 2, spent all their spare periods studying in

the resource centre is shown on the stem-and-leaf plot.

Key: 2|5 = 25 students


Stem Leaf
0 8
1
2 5667
3 02369
4 79
5 6
6 1
The standard deviation for this set of data, to the nearest whole number is:
A. 12 B. 14 C. 17
D. 35 E. 53

TOPIC 12 Univariate data 945


12. Each week, varying amounts of a chemical are added to a filtering system. The amounts required
(in mL) over the past 20 weeks are shown in the stem-and-leaf plot.

Key: 3|8 represents 0.38 ml


Stem Leaf
2 1
2 22
2 4445
2 66
2 8 8 99
3 0
3 22
3 4
3 6
3 8
Calculate to 2 decimal places the standard deviation of the amounts used.
13. Calculate the mean, median and mode of this data set: 2, 5, 6, 2, 5, 7, 8. Comment on the shape of
the distribution.

14. The box plot shows the heights (in cm) of Year 12 students in a Maths class.

150 155 160 165 170 175


Height (cm)

a. State the median class height.


b. Calculate the range of heights.
c. Calculate the interquartile range of the heights.

Problem solving
15. MC A data set has a mean of 75 and a standard deviation of 5. Another score of 50 is added to the data

set. Choose which of the following will occur.


A. The mean will increase and the standard deviation will increase.
B. The mean will increase and the standard deviation will decrease.
C. The mean will decrease and the standard deviation will increase.
D. The mean will decrease and the standard deviation will decrease.
E. The mean and the standard deviation will both remain unchanged.

16. MC A data set has a mean of 60 and a standard deviation of 10. A score of 100 is added to the data set.

This score becomes the highest score in the data set. Choose which of the following will increase.
Note: There may be more than one correct answer.
A. Mean
B. Standard deviation
C. Range
D. Interquartile range
E. Median

946 Jacaranda Maths Quest 10


17. A sample of 30 people was selected at random from those attending a local swimming pool. Their ages
(in years) were recorded as follows.
19, 7, 58, 41, 17, 23, 62, 55, 40, 37, 32, 29, 21, 18, 16,
10, 40, 36, 33, 59, 65, 68, 15, 9, 20, 29, 38, 24, 10, 30

a. Calculate the mean and the median age of the people in this sample.
b. Group the data into class intervals of 10 (0–9 etc.) and complete the frequency distribution table.
c. Use the frequency distribution table to estimate the mean age.
d. Calculate the cumulative frequency and hence plot the ogive.
e. Estimate the median age from the ogive.
f. Compare the mean and median of the original data in part a with the estimates of the mean and the
median obtained for the grouped data in parts c and e.
g. Determine if the estimates were good enough. Explain your answer.

18. The table below shows the number of cars that are garaged at each house in a certain street each night.

Number of cars Frequency


1 9
2 6
3 2
4 1
5 1

a. Show these data in a frequency histogram.


b. State if the data is positively or negatively skewed. Justify your answer.

TOPIC 12 Univariate data 947


19. Consider the data set represented by the frequency histogram shown.

10
9
8
7

Frequency
6
5
4
3
2
1

0 1 2 3 4 5
Score

a. Explain if the data is symmetrical.


b. State if the mean and median of the data can be seen. If so, determine their values.
c. Evaluate the mode of the data.

20. There are three m values in a data set for which x = m and 𝜎 =
m
.
2
a. Comment on the changes to the mean and standard deviation if each value of the data set is

b. An additional value is added to the original data set, giving a new mean of m + 2. Evaluate the
multiplied by m.

additional value.

21. The following data show the number of pets in each of the 12 houses in Coral Avenue, Rosebud.
2, 3, 3, 2, 2, 3, 2, 4, 3, 1, 1, 0

a. Calculate the mean and median number of pets.


b. The empty block of land at the end of the street was bought by a Cattery and now houses 20 cats.
Recalculate the mean and median.
c. Explain why the answers are so different, and which measure of central tendency is best used for
certain data.

948 Jacaranda Maths Quest 10


22. The number of Year 10 students in all the 40 schools in the Northern District of the Education
Department was recorded as follows.
56, 134, 93, 67, 123, 107, 167, 124, 108, 78, 89, 99, 103, 107, 110, 45, 112, 127, 106, 111, 127, 145,
87, 75, 90, 123, 100, 87, 116, 128, 131, 106, 123, 87, 105, 112, 145, 115, 126, 92
a. Using an interval of 10, produce a table showing the frequency for each interval.
b. Use the table to estimate the mean.
c. Calculate the mean of the ungrouped data.
d. Compare the results from parts b and c and explain any differences.

23. The following back-to-back stem-and-leaf plot shows the ages of a group of 30 males and 30 females as
they enter hospital for the first time.
Key: 1 | 7 = 17
Leaf: Male Stem Leaf: Female
98 0 5
998886321 1 77899
87764320 2 00 1 2 4 5 5 6 7 9
86310 3 013358
752 4 2368
53 5 134
6 2
8 7
a. Construct a pair of parallel box plots to represent the two sets of data, showing working out for the
median and 1st and 3rd quartiles.
b. Calculate the mean, range and IQR for both sets of data.
c. Determine any outliers if they exist.
d. Write a short paragraph comparing the data.

24. The test scores, out of a total score of 50, for two classes A and B are shown in the back-to-back
stem-and-leaf plot.
Key: 1 | 4 = 14
Leaf: Class A Stem Leaf: Class B
5 0 124
9753 1 145
97754 2 005
886551 3 155
320 4 157789
0 5 00
a. Ms Vinculum teaches both classes and made the statement that ‘Class A’s performance on the test
showed that the students’ ability was more closely matched than the students’ ability in Class B’.
By calculating the measure of centre, first and third quartiles, and the measure of spread for the test
scores for each class, explain if Ms Vinculum’s statement was correct.
b. Would it be correct to say that Class A performed better on the test than Class B?
Justify your answer by comparing the quartiles and median for each class.

TOPIC 12 Univariate data 949


25. The times, in seconds, of the duration of 20 TV advertisements shown in the 6–8 pm time slot are
recorded below.

16, 60, 35, 23, 45, 15, 25, 55, 33, 20, 22, 30, 28, 38, 40, 18, 29, 19, 35, 75

a. From the data, determine:


i. the mode
ii. the median
iii. the mean (write your answer correct to 2 decimal places)
iv. the range
v. the lower quartile
vi. the upper quartile
vii. the interquartile range.
b. Using your results from part a, construct a box plot for the time, in seconds, for the 20 TV
advertisements in the 6–8 pm time slot.
c. From your box plot, determine:
i. the percentage of advertisements that are more than 39 seconds in length
ii. the percentage of advertisements that last between 21 and 39 seconds
iii. the percentage of advertisements that are more than 21 seconds in length

The types of TV advertisements during the 6–8 pm time slot were categorised as Fast food,
Supermarkets, Program information and Retail (clothing, sporting goods, furniture). A frequency
table for the frequency of these advertisements is shown below.

Type Frequency
Fast food 7
Supermarkets 5
Program information 3
Retail 5

d. State the type of data that has been collected in the table.
e. Determine the percentage of advertisements that are advertisements for fast food outlets.
f. Suggest a good option for a graphical representation of this type of data.

950 Jacaranda Maths Quest 10


26. The speeds, in km/h, of 55 cars travelling along a major road are recorded in the following table.

Speed Frequency
60–64 1
65–69 1
70–74 10
75–79 13
80–84 9
85–89 8
90–94 6
95–99 3
100–104 2
105–109 1
110–114 1
Total 55

a. By calculating the midpoint for each class interval, determine the mean speed, in km/h, of the cars
travelling along the road.
Write your answer correct to 2 decimal places.
b. The speed limit along the road is 75 km/h. A speed camera is set to photograph the license plates of
cars travelling 7% more than the speed limit.
A speeding fine is automatically sent to the owners of the cars photographed.

c. Drivers of cars travelling 5 km/h up to 15 km/h over the speed limit are fined $135. Drivers of cars
Based on the 55 cars recorded, determine the number of speeding fines that were issued.

travelling more than 15 km/h and up to 25 km/h over the speed limit are fined $165 and drivers of
cars recorded travelling more than 25 km/h and up to 35 km/h are fined $250.
Drivers travelling more than 35 km/h pay a $250 fine in addition to having their driver’s license
suspended.
Assume that this data is representative of the speeding habits of drivers along a major road and there
are 30 000 cars travelling along this road on any given month.
i. Determine the amount, in dollars, collected in fines throughout the month. Write your answer
correct to the nearest cent.
ii. Evaluate the number of drivers that would expect to have their licenses suspended throughout
the month.

To test your understanding and knowledge of this topic, go to your learnON title at
www.jacplus.com.au and complete the post-test.

TOPIC 12 Univariate data 951


Answers b.
Cumulative
Class interval Frequency frequency
Topic 12 Univariate data 0–9 5 5
12.1 Pre-test 10–19 5 10
1. 2 20–29 5 15
2. 29 30–39 3 18
3. 72 40–49 5 23
4. 29.8 50–59 3 26
5. subset
60–69 3 29
6. 22
70–79 1 30
7. 13
Total 30
8. Positively skewed

Mean = $32.50, median = $30


9. 11

x = 184.42 and 𝜍 = 7.31


10. C
c.

Cumulative frequency
11.
30
12. D 25
13. B 20
14. The mean typing speed is 26.53 and IQR is 19 for Year 8. 15
The mean typing speed is 40.53 and IQR is 20 for Year 10. 10
This suggests, that the mean typing speed for Year 10 is 5
greater than the Year 8 students. The interquartile range is 0 10 20 30 40 50 60 70 80
not the same for both Year 8 and Year 10. Amount spent ($)
15. E
d. The mean is slightly underestimated; the median is exact.
12.2 Measures of central tendency The estimate is good enough as it provides a guide only
1. a. 7 b. 8 c. 8 to the amount that may be spent by future customers.
15. a. 3
2. a. 6.875 b. 7 c. 4, 7
b. 4, 5, 5, 5, 6 (one possible solution)
3. a. 39.125 b. 44.5 c. No mode
c. One possible solution is to exchange 15 with 20.
4. a. 4.857 b. 4.8 c. 4.8 16. a. Frequency column: 16, 6, 4, 2, 1, 1

Science: mean = 57.6, median = 57, mode = 42, 51


5. a. 12 b. 12.625 c. 13.5 b. 6.8

Maths: mean = 69.12, median = 73, mode = 84


6. c. 0–4 hours
d. 0–4 hours
7. a. 5.83 b. 6 c. 6 17. a. Frequency column: 1, 13, 2, 0, 1, 8

Mean = 2.5, median = 2.5


8. a. 14.425 b. 15 c. 15 b. Age of emergency

b. Mean = 4.09, median = 3


9. a. ward patients

15
c. Median

b. 70− < 80 70− < 80


Frequency

15
10. a. 70 c. 10
16

65− < 70
11. 124.83 5
12.
13. a. B b. B 0
7.5 22.5 37.5 52.5 67.5 82.5

Mean = $32.93, median = $30


c. C d. D Age
14. a. c. Asymmetrical or bimodal (as if the data come from two
separate graphs)

e. 15 − <30
d. 44.1

f. 15 − <30

952 Jacaranda Maths Quest 10


g. 7. a.

Cumulative frequency (%)


Cumulative frequency
28 40

Cumulative frequency
24 100% 35
20 30
16 25
12 50% 20
8 15
4 10
5
0 30 60 90
Age 0 50 55 60 65 70 75 80
h. 28 Battery life (h)
i. No

Q1 = 58, Q3 = 67
b. i. 62.5
j. Sample responses can be found in the worked solutions in
ii.
the online resources.
iii. 9

a. Player A median = 34.33, Player B median = 41.83


18. A
iv. 14
19.

8. IQR = 28
v. 6

c. Player A median = 32.5, Player B median = 0


b. Player B

d. Player A 55
e. Player A is more consistent. One large score can distort 50

Cumulative frequency
the mean. 45
20. Sample responses can be found in the worked solutions in 40
the online resources. 35
30
21. a. Frequency column: 3, 8, 5, 3, 1
25

c. 40− < 50
b. 50.5 20

d. 40− < 50
15
10
e.
5
Ogive of pulse rate
Cumulative frequency (%)

of female athletes
Cumulative frequency

0 120130140150160170180 190200

Range = 23
Class interval
20 100%

ii. IQR = 13.5


9. a. i.
15

b. i. Range = 45
10 50%

ii. IQR = 27.5


5

c. i. Range = 49
0 30 50 70

ii. IQR = 20
Beats per minute

f. Approximately 48 beats/ min 10. Measures of spread tell us how far apart the values (scores)
22. Answers will vary. Sample responses include: are from one another.
a. 3, 4, 5, 5, 8 11. a. 25.5
b. 4, 4, 5, 7, 10 b. 28
c. 2, 3, 6, 6, 12 c. 39

2a + b
23. 12 d. 6
e. The three lower scores affect the mean but not the median
24.
3 or mode.
mean = 32.3; median = 32.5; range = 38;
IQR = 14
25. 13, 31, 31, 47, 53, 59 12. a. Men:

mean = 29.13; median = 27.5; range = 36;


12.3 Measures of spread
IQR = 13
Women:
1. a. 15 b. 77.1 c. 9
2. a. 7 b. 7 c. 8.5 d. 39
b.Typically, women marry younger than men, although the

13. Mean = 25.036, median = 24.9, mode = 23.6, range = 8.5,


3. a. 3.3 kg b. 1.5 kg spread of ages is similar.

IQR = 3.4
4. 22 cm

14. a = 22, b = 9, c = 9 and d = 8


5. 0.8
6. C

TOPIC 12 Univariate data 953


Range = 10 − 1 = 9
15. a. Yes b.

Median (middle score) = 6

To maintain range, min = 1 and max = 10.


b. 6

If median = 6, then 3rd and 4th scores must be 6.


0 2 4 6 8 10 12 14

Therefore, the 6th score must be 6. This will maintain Both graphs indicate that the data is slightly negatively
the range, Q1 , Q3 and median. skewed. However, the box plot provides an excellent
summary of the centre and spread of the distribution.
12.4 Box plots
14.
1. a. 5 b. 26
2. a. 6 b. 27 10

3. a. 5.8 b. 18.6 8

Frequency
4. a. 140 b. 55 c. 90 d. 85 e. 26
6
5. a. 58 b. 31 c. 43 d. 27 e. 15
6. B 4
7. C
8. D, E 2
9. a. 22, 28, 35, 43, 48
b. 0 20 40 60 80 100 120 140
Number of passengers on bus journeys

20 30 40 50
Sales

10. a. (10, 13.5, 22, 33.5, 45) 0 20 40 60 80 100 120 140


b. Number of passengers

15. a.
Xmin Q1 Median Q3 Xmax
0 10 20 30 40 50
Before 75 86 95 128.5 152
Rainfall (mm)
After 66 81 87 116 134
11. a. (18, 20, 26, 43.5, 74)
b. After
b.

Before
10 30 50 70
Age
60 70 80 90 100 110 120 130 140 150 160
The distribution is positively skewed, with most of the
c.
offenders being young drivers. As a whole, the program was effective. The median
c.
12. a. (124 000, 135 000, 148 000, 157 000, 175 000) weight dropped from 95 kg to 87 kg, a loss of 8 kg. A
noticeable shift in the graph shows that after the program
b.
50% of participants weighed between 66 and 87 kg,
compared to 25% of participants weighing between 75
120 140 160 180 and 86 kg before they started.
Before the program, the range of weights was 77 kg
($ × 1000)
(from 75 kg to 152 kg); after the program, the range
had decreased to 68 kg. The IQR also diminished from
13. a.
42.5 kg to 35 kg.
16. The advantages of box plots is that they are clear visual
representations of 5-number summary, display outliers and
0 1 2 3 4 5 6 7
can handle a large volume of data. The disadvantage is that
individual scores are lost.
Both graphs indicate that the data is slightly positively
skewed. However, the box plot provides an excellent
summary of the centre and spread of the distribution.

954 Jacaranda Maths Quest 10


17. a. Key: 12|1 = 121 quartile, median, third quartile and maximum time).
The next best performing group was the 25–29-year-olds.
Stem Leaf They had the same median as the 20–24-year-olds, but
12 1569 outperformed them in all of the other metrics.
13 124 The 24–29-year-olds were the most consistent group,
14 3488 with a range of 1750 seconds compared to the range of
15 022257 2000 seconds of the other groups.
16 35 12.5 The standard deviation (Optional)
17 29
1. a. 2.29 b. 2.19 c. 20.17 d. 3.07
18 1112378
2. a. 2 b. 1 c. 12 d. 0.3
b.
3. a. 1.97 b. 16.47
4. a. 1.03 b. 1.33 c. 2.67 d. 2.22

Mean = 1.64% Std dev. = 0.45%


120 140 160 180 5. 10.82

Mean = 1.83 Std dev. = 0.06 m


Number sold 6.
7.
On most days the hamburger sales are less than 160. Over
c.
8. 0.49 s
18. a. Key: 1*|7 = 17 years
the weekend the sales figures spike beyond this.
9. 15.10 calls
Stem Leaf 10. B
1∗ 7788899 11. The mean of the first 25 cars is 89.24 km/h with a standard
deviation of 5.60. The mean of the first 26 cars is 91.19
2 000122223333444
with a standard deviation of 11.20, indicating that the
2∗ 5589
extreme speed of 140 km/h is an anomaly.
3 123
12. The standard deviation tells us how spread out the data is
3∗
a. 𝜍 ≈ 3.58
from the mean
4
13.
4∗ 8
remains at 𝜍 ≈ 3.58.
b. The mean is increased by 7 but the standard deviation
b.
×

to 𝜍 ≈ 10.73.
c. The mean is tripled and the standard deviation is tripled

15 25 35 45
Age
14. The standard deviation will decrease because the average
distance to the mean has decreased.
c. The distribution is positively skewed, with first-time 15. 57 is two standard deviations above the mean.
mothers being under the age of 30. There is one outlier 16. a. New mean is the old mean increased by 3 (15) but no
(48) in this group. change to the standard deviation.
19. f b. New mean is 3 times the old mean (36) and new standard
deviation is 3 times the old standard deviation (12).
17. a. 43 b. 12 c. 12.19

12.6 Comparing data sets


1. a. The mean of the first set is 63. The mean of the second
Size set is 63.

HJ Looker: median = 5; Hane and Roarne: median = 6


b. The standard deviation of the first set is. 2.38 The
20. a. standard deviation of the second set is 6.53.
b. HJ Looker c. For both sets of data the mean is the same, 63. However,
c. HJ Looker the standard deviation for the second set (6.53) is much
higher than the standard deviation of the first set (2.38),
d. Hane and Roarne had a higher median and a lower spread
implying that the second set is more widely distributed

$50 b. $135 c. $100 d. $45


and so they appear to have performed better.
67 − 60 = 7 for the first set and 72 − 55 = 17 for the
than the first. This is confirmed by the range, which is
21. a. e. 50%

b. i. Under 20 − (20–24): 750 seconds difference Morning: median = 2.45; afternoon: median = 1.6
22. a. 9625 seconds second.

ii. Under 20 − (25–29): 500 seconds difference Morning: range = 3.8; afternoon: range = 5
2. a.

iii. (20–24) − (25–29): 250 seconds difference


b.
c. The waiting time is generally shorter in the afternoon.
c. The under-20s performed best of the three groups, with One outlier in the afternoon data causes the range to be
the fastest time for each metric (minimum time, first larger. Otherwise the afternoon data are far less spread
out.

TOPIC 12 Univariate data 955


3. Key: 16|1 = 1.61 m 13. a. Machine A

Leaf: Boys Stem Leaf: Girls Machine B


997 15 1256788
98665540 16 4467899
4421 17 0 40 42 44 46 48 50 52 54 56 58 60

Ford: median = 15; Hyundai: median = 16


Number of Smarties in a box

Ford: range = 26; Hyundai: range = 32 Machine A: mean = 49.88,


4. a.

Ford: IQR = 14; Hyundai: IQR = 13.5 standared deviation = 2.87;


b. b.

Machine B: mean = 50.12,


c.

standared deviation = 2.44


d.
Ford
c. Machine B is more reliable, as shown by the lower
Hyundai
standard deviation and IQR. The range is greater on

Nathan: mean = 15.1; Timana: mean = 12.3


machine B only because of a single outlier.

Nathan: range = 36; Timana: range = 14


14. a.
0 5 10 15 20 25 30 35 40

Nathan: IQR = 15; Timana: IQR = 4


b.

b. Brisbane Lions: range = 65;


5. a. Brisbane Lions c.

Sydney Swans: range = 55


d. Timana’s lower range and IQR shows that he is the more

c. Brisbane Lions: IQR = 40;


consistent player.

Sydney Swans: IQR = 35


15. a. Yes — Maths
b. Science: positively skewed
6. a. c. The Science test may have been more difficult.
Girls
d. Science: 61−70, Maths: 71−80
Boys e. Maths has a greater standard deviation (12.6) compared

Key: 2|3 = 2.3 hours


to Science (11.9).
16. a.
1.4 1.5 1.6 1.7 1.8 1.9
Height Leaf: Group A Stem Leaf: Group B

Boys: median = 1.62; girls: median = 1.62


873 1 78

Boys: range = 0.36; girls: range = 0.23


b.
951 2 01245588

Boys: IQR = 0.25; girls: IQR = 0.17


c.
875422 3 222455568
d.
754222 4 2
e. Although boys and girls have the same median height, 5
the spread of heights is greater among boys as shown by 22 6

Summer: range = 23; winter: range = 31


the greater range and interquartile range.
b. Five-point summary

Summer: IQR = 14; winter: IQR = 11


7. a.
Group A: 13 27 36 43 62
b. Group B: 17 23 30 35 42
c. There are generally more cold drinks sold in summer
as shown by the higher median. The spread of data is Group B
similar as shown by the IQR although the range in winter
is greater. Group A
8. A
9. A, B, C, D 10 20 30 40 50 60 70 Hours
10. a. Cory achieved a better average mark in Science (59.25) Nouns

b. Cory was more consistent in English (𝜍 = 4.9) than he


than he did in English (58.125). c.Sample responses can be found in the worked solutions

was in Science (𝜍 = 19.7)


in the online resources.
d. Sample responses can be found in the worked solutions

11. Sample responses can be found in the worked solutions in in the online resources.

a. Back street: x = 61, 𝜍 = 4.3;


the online resources. e. Sample responses can be found in the worked solutions

main road: x = 58.8, 𝜍 = 12.1


12.
in the online resources.

b. The drivers are generally driving faster on the


back street.
c. The spread of speeds is greater on the main road as
indicated by the higher standard deviation.

956 Jacaranda Maths Quest 10


17. a. Leaf: Year 10 Stem Leaf: Year 11 Survey. A census would involve opening every bottle.
c.
98754 15 d. Census. The instructor must have an accurate record of
9874330 16 03678 each learner driver’s progress.
7532200 17 223456779 4. a. Survey b. Survey c. Census d. Survey
0 18 0 1 358 5. a. About 25
19 0 b. Drawing numbers from a hat, using a calculator.
b.
Year 12 6. a. The council is probably hoping it is a census, but it will
probably be a survey because not all those over 10 will
Year 10 respond.
b. Residents may not all have internet access. Only those
x

On average, the Year 12 students are about 6 − 10 cm


150 160 170 180 190 who are highly motivated are likely to respond.
c. 7. The sample could have been biased. The questionnaire may
taller than the Year 10 students. The heights of the have been unclear.
majority of Year 12 students are between 170 cm and 8. Sample size, randomness of sample
180 cm, whereas the majority of the Year 10 students are 9. Sample responses can be found in the worked solutions in

18. a. English: mean = 70.25; Maths: mean = 69


between 160 and 172 cm in height. the online resources.

b. English: range = 53; Maths: range = 37


10. Sample response: A census of very large populations

c. English: 𝜍 = 16.1; Maths: 𝜍 = 13.4


requires huge amounts of infrastructure and staff to
collect the information for large numbers of people. These
d. Kloe has performed more consistently in Maths as the challenges could also be made harder because many people
range and standard deviation are both lower. live in remote areas with poor transport access for census
staff; forms may need to be created in multiple languages;
19. 32%
and migrants who do not have residency permits may be
20. a. unwilling to complete a census.
11. There is quite a variation in the frequency of particular
numbers drawn. For example, the number 45 has not been
50 55 60 65 70 75 80 85
drawn for 31 weeks, while most have been drawn within
The parallel box plots show a significant gap between the last 10 weeks. In the long term, one should find the
the life expectancy of Aboriginal and Torres Strait frequency of drawing each number is roughly the same. It
Islander people and that of non-Aboriginal and non– may take a long time for this to happen, as only 8 numbers

a. Mean = 32.03; median = 29.5


Torres Strait Islander people. Even the maximum median are drawn each week.
age of Aboriginal and Torres Strait Islander people is 12.
much lower than the minimum of non-Aboriginal and b.
non–Torres Strait Islander people. Class interval Frequency
b. The advantage of box plots is that it gives a clear 0–9 2
graphical representation of the results and in this case 10–19 7
shows a significant difference between the median life 20–29 6
expectancy of Aboriginal and Torres Strait Islander 30–39 6
people and non-Aboriginal and non–Torres Strait
40–49 3
Islander people. The disadvantage is that we lose the
data for individual states and territories. 50–59 3
60–69 3
12.7 Populations and samples Total 30

Mean = 31.83
1. a. When was it first put into the machine? How old was the
battery before being purchased? How frequently has the c.
computer been used on battery?
d.
Cumulative frequency

b. Can’t always see if a residence has a dog; a census is 30


very time-consuming; perhaps could approach council 25
for dog registrations. 20
c. This number is never constant with ongoing purchases, 15
and continuously replenishing stock. 10
d. Would have to sample in this case as a census would 5
involve opening every packet. 0 10 20 30 40 50 60 70

Median = 30
2. Sample responses can be found in the worked solutions in Age
the online resources. e.
3. a. Census. The airline must have a record of every f. Estimates from parts c and e were fairly accurate.
passenger on every flight.
g. Yes, they were fairly close to the mean and median of the
b. Survey. It would be impossible to interview everyone. raw data.

TOPIC 12 Univariate data 957


Year 8: mean = 26.83, median = 27, range = 39, IQR = 19
Year 10: mean = 40.7, median = 39.5 range = 46 IQR = 20
13. 5. a. Resting Middle x× Cumulative
pulse Frequency value, x frequency frequency
The typing speed of Year 10 students is about 13 to 14 wpm
30–39 3 34.5 103.5 3
faster than that of Year students. The spread of data in Year
8 is slightly less than the spread in Year 10. 40–49 8 44.5 356.0 11
50–59 5 54.5 272.5 16
12.8 Evaluating inquiry methods and statistical 60–69 3 64.5 193.5 19
reports
70–79 1 74.5 74.5 20
1. a. Primary. There is probably no secondary data available.
b. Sample responses can be found in the worked solutions
in the online resources. b. Mean = 48.65
Median = 48
c. Sample responses can be found in the worked solutions
Mode = 48
in the online resources.
c. From the table it can be clearly seen that the highest
2. Company profits
concentration of resting pulse readings was in the 40–49
Company profits
and 50–59 groups. All three measures of central tendency
fell within these two groupings.
6.5
Because of the higher frequency in the 40−49 group,
Profits ($’000 000)

6.4
it was not surprising that the mode and median were
6.3
contained there also. The mean was slightly higher, and
6.2
this would have been influenced by the one reading in the
6.1
70−79 group.
6.0
5.9 6. Player B appears to be the better player if the mean result is
used. However, Player A is the more consistent player.
0 1 2 3 4 7. a. 7.1 b. 7 c. 7
Quarter
d. The mode has the most meaning as this size sells
the most.
Mean salaries
8. Points that could be mentioned include:
Company salaries 10.1% is only just ‘double digit’ growth.
Mean Salaries

2006–08 showed mid to low 20% growth. Growth has been


($’000 000)

15
10 declining since 2008.
5 The share price has rebounded, but not to its previous high.

30c, except for $27.70 to $28.10?(40c increment). Note also


The share price scale is not consistent. Most increments are
0 1 2 3 4
Quarter the figure of 20.80 — probably a typo instead of 26.80.

Mean = $21 5000, median = $170 000,


9. A misleading graph steers/convinces the reader towards a

mode = $150 000. The median best represents these land


3. a. particular opinion. It can be biased and not accurate.
10. Shorten the y-axis and expand the x-axis.
prices. The mean is inflated by one large score, and the Aussie dollar

b. Range = $500 000, interquartile range = $30 000. The


mode is the lowest price. US c
90 c
interquartile range is the better measure of spread.
80 c

0
150 000 300 000 450 000 600 000

Key: 3|85 = 385 hours


13 July 13 September
Price Time
11. a.

close together, while the score of $650 000 is an outlier.


c. This dot plot shows how 9 of the scores are grouped Leaf: Brand B Stem Leaf: Brand A
3 85 90
d. The agent is quoting the modal price, which is the lowest 4 25 26
price. This is not a true reflection of the average price of 60 55 00 5 70

False. Mean = 1.82 m, lower quartile = 1.765 m,


these blocks of land. 30 6 40 45

median = 1.83 m
4. a. 70 42 35 20 7 30 35 60

Brand A: mean = 570.6, median = 605


60 20 8

Brand B: mean = 689.2, median = 727.5


b. True. This is the definition of interquartile range.
c. Players with heights 1.83 m, 1.83 m, 1.88 m, 1.88 m,
1.88 m. b. Brand A had the shortest mean lifetime.

958 Jacaranda Maths Quest 10


Brand B had the longest mean lifetime.
c. Players to be selected:
d. Brand B Would recommend Will if the team needs someone

employees earn $18000.


12. a. The statement is true, but misleading as most of the
with very consistent batting scores every game but no
outstanding runs.
b. The median and modal salary is $18000 and only 15 out Would recommend Rohit if the team needs someone who
of 80 (less than 20%) earn more than the mean. might score very high occasionally but in general fails to
13. Left bar chart suggests prices have tripled in one year due to score many runs.
fact vertical axis does not start at zero. Bar chart on right is Would recommend Marnus if the team needs someone
truly indicative of situation. who is fairly consistent but can score quite well at times
14. a. Percentages do not add to 100%. and the rest of the time does OK.
b. Percentages do not add to 100%.
Would recommend Ben if the team needs someone who
c. Such representation allows multiple choices to have
is fairly consistent but can score quite well at times and
closer percentages than really exist. the rest of the time has a better median than Glenn.
3.
Project Semifinal Final Overall
average average average
1. Runs in the last Josh 2.4 4.67 3.64
Player 25 matches Mean Median Range IQR
Ravi 2.5 5 3.57
Will 13, 18, 23, 21, 16.76 17 31 8.5
9, 12, 31, 21, 20,
18, 14, 16, 28, 4. In the final, wickets were more costly than in the semifinal.
17, 10, 14, 9, 23, That is, Josh conceded many runs in getting his six wickets.
12, 24, 0, 18, 14, This affected the overall mean. In reality Josh was the most
14, 20 valuable player overall, but this method of combining the
data of the two matches led to this unexpected result.
Rohit 2, 0, 112, 11, 0, 17.04 4 164 10.5
0, 8, 0, 10, 0, 56,
12.9 Review questions
4, 8, 164, 6, 12,
2, 0, 5, 0, 0, 0, 8, 1. a. You would need to open every can to determine this.
18, 0 b. Fish are continuously dying, being born, being caught.
Marnus 12, 0, 45, 23, 0, 16.76 12 45 25.5 c. Approaching work places and public transport offices.
8, 21, 32, 6, 0, 8, 2. a. Survey
14, 1, 27, 23, 43, b. Census
7, 45, 2, 32, 0,
c. Survey
6,11, 21, 32

Mean = 11.55; median = 10; mode = 8


3. a. 20 b. 24 c. 8
Ben 2, 0, 3, 12, 0, 2, 16.72 17 42 25

Mean = 36; median = 36; mode = 33, 41


5, 8, 42, 0, 12, 8, 4. a.

c. Mean = 72.18; median = 72; mode = 72


9, 17, 31, 28, 21, b.
42, 31, 24, 30,
22, 18, 20, 31 5. a. 6 b. 6 c. 20

Year 8: mean = 26.83, median = 27, range = 39,


6. a. 4 b. 0.85
Will: has a similar mean and median, which shows he
IQR = 19, sd = 11.45
2. a.
7. a.
was fairly consistent. The range and IQR values are
b. Year 10: mean = 40.7, median = 39.5, range = 46,
lowindicating that his scores remain at the lower end
IQR = 20, sd = 12.98
with not much deviation for the middle 50% .
b. Rohit: has the best average but a very low median
c. The typing speed of Year 10 students is about 13 to 14
indicating his scores are not consistent. The range is
extremely high and the IQR very low in comparison wpm faster than that of Year 8 students. The spread of
showing he can score very well at times but is not a data in Year 8 is slightly less than in Year 10.
consistent scorer. d.
Year 10
c. Marnus: has a similar mean to Will and Ben but a lower
median, indicating his scores are sometimes high but Year 8
generally are lower than the average. The range and IQR
show a consistent batting average and spread with only a 0 10 20 30 40 50 60 70
few higher scores and some lower ones.
d. Ben: has a similar mean and median which shows he was
a consistent player. The range and IQR show a consistent
batting average and spread.

TOPIC 12 Univariate data 959


8. a. Key: 3*|9 = 3.9 kg 18. a.
9
Stem Leaf 8
3* 9 7

Frequency
4 00023 6
5
4* 5678
4
5 03 3
5* 5889 2
6 122 1
6* 8
0 1 2 3 4 5
b. (3.9, 4.4, 4.9, 5.85, 6.8) Number of cars
c. b. Positively skewed — a greater number of scores is
distributed at the lower end of the distribution.
3.5 4.5 5.5 6.5 kg 19. a. Yes b. Yes. Both are 3.
c. 3

x = m2 𝜍= 7m + 2
9. a. 24.4 b. 1.1 c. 7.3
m2
10. A 20. a. b.

Mean = 2.17, median = 2


2
11. B

Mean = 3.54, median = 2


21. a.

Mean = 5, median = 5, mode = 2 and 5.


12. 0.05 ml
b.
13.
c. The median relies on the middle value of the data and

a. Median height = 167 cm


The distribution is positively skewed and bimodal.
won’t change much if an extra value is added. The mean

b. Range = 25 cm
14.
however has increased because this large value will

c. IQR = 5 cm
change the average of the numbers. The mean is used
as a measure of central tendency if there are no outliers
15. C or if the data are symmetrical. The median is used as a
measure of central tendency if there are outliers or the
a. Mean = 32.03; median = 29.5
16. A, B and C
data are skewed.
17.
22. a.
b. Frequency

44.5 × 1 = 44.5
Class interval Frequency Interval (f) Midpoint × ( f )

54.5 × 1 = 54.5
0–9 2 40–49 1

64.5 × 1 = 64.5
10–19 7 50–59 1

74.5 × 2 = 149
20–29 6 60–69 1

84.5 × 4 = 338
30–39 6 70–79 2

94.5 × 4 = 378
40–49 3 80–89 4

104.5 × 8 = 836
50–59 3 90–99 4

114.5 × 6 = 687
60–69 3 100–109 8

124.5 × 8 = 996
Total 30 110–119 6

Mean = 31.83 134.5 × 2 = 269


120–129 8

144.5 × 2 = 289
c. 130–139 2

154.5 × 0 = 0
d. 140–149 2
Cumulative frequency

30

164.5 × 1 = 164.5
150–159 0
25
20 160–169 1
15 Total 40 4270
10
5
b. 106.75
0 10 20 30 40 50 60 70 80 c. 107.15

Median = 30
Amount spent ($)
d. The differences in this case were minimal; however,
e. the grouped data mean is not based on the actual data
f. Estimates from parts c and e were fairly accurate. but on the frequency in each interval and the interval
g. Yes, they were fairly close to the mean and median of the midpoint. It is unlikely to yield an identical value to the
raw data. actual mean. The spread of the scores within the class
interval has a great effect on the grouped data mean.

960 Jacaranda Maths Quest 10


23. a.
Females

Males

x
0 10 20 30 40 50 60 70 80
Age

b.
Males Females
Mean 28.2 31.1
Range 70 57
IQR 18 22

c. There is one outlier — a male aged 78.


d. Typically males seem to enter hospital for the first time at
a younger age than females.
24. a. Class A: Q1 = 21.5, median = 30, Q3 = 38, IQR = 16.5
Class B: Q1 = 14.5, median = 33, Q3 = 47, IQR = 32.5
Based on the comparison between Class A’s IQR (16.5)
and Class B’s IQR (32.5), Ms Vinculum was correct in
her statement.
b. No, Class B has a higher median and upper quartile score
than Class A, while Class A has a higher lower quartile.
You can’t confidently say that either class did better in
the test than the other.
25. a. i. 35 s
ii. 29.5 s
iii. 33.05 s
iv. 60 s
v. 21s
vi. 39 s
vii. 18 s
b. 21 29.5 39

15 20 25 30 35 40 45 50 55 60 65 70 75 t

c. i. 25%
ii. 50%
iii. 75%
d. Categorical
e. 35%
f. Pictogram, pie chart or bar chart.
26. a. 82.73 km/h

c. i. $2 607 272.73
b. 30 cars

ii. About 545

TOPIC 12 Univariate data 961

You might also like