100% found this document useful (1 vote)
240 views

CH 10 Analysing Data

Uploaded by

Harry White
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
240 views

CH 10 Analysing Data

Uploaded by

Harry White
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

10.

STATISTICAL ANALYSIS

ANALYSING DATA
The meteorologists at the Bureau of Meteorology measure and record weather data from over 1000
sites in Australia and Antarctica, and calculate statistics about temperature, rainfall and humidity.
Climate averages, such as the median monthly rainfall or the mean number of rainy days per month,
are calculated from weather data gathered over many years and can assist farmers to decide on the
best times to plant crops.

CHAPTER OUTLINE
S1.2 10.01 The mean, median and mode
S1.2 10.02 Quartiles, deciles and percentiles
S1.2 10.03 The range and interquartile range
S1.2 10.04 The effect of outliers
S1.1 10.05 Cumulative frequency graphs
S1.2 10.06 Box plots
S1.2 10.07 Standard deviation
S1.2 10.08 The shape of a distribution
IN THIS CHAPTER YOU WILL:
• calculate and interpret the mean, median and mode of sets of data, including ungrouped data
• calculate and interpret the quartiles, deciles and percentiles of a set of data
• calculate and interpret the range, interquartile range and standard deviation of sets of data,
including ungrouped data
• identify outliers in a set of data and examine their effects on statistical measures
• calculate cumulative frequency and construct cumulative frequency histograms and polygons
• use a cumulative frequency polygon to find the median, quartiles and interquartile range of a
data set
• use a five-number summary to construct box plots
• describe the shape of a distribution using its graph or display

iStock.com/behindlens
TERMINOLOGY
box plot class centre class interval
cumulative frequency cumulative frequency histogram cumulative frequency polygon
decile distribution extremes
five-number summary interquartile range mean
measure of central tendency measure of spread median
median class modal class mode
ogive outlier peak
percentile population quartile
range sample standard deviation
summary statistics symmetrical

SkillCheck
1 This stem-and-leaf plot shows the ages of Stem Leaf
WS
visitors entering the Royal Easter Show in a 0 3 8 9
Assignment
Homework five-minute period. 1 0 2 2 2 5 6 7 9
10
a How many visitors entered the show during the 2 0 2 3 4 6 7
five-minute period? 3 1 3 3 4 9
b What was the age of the oldest visitor? 4 3 4 7 8
c What was the most common age? 5 5 5 8
d How many visitors were under 16 years old?
e What was the middle age?

2 Is a frequency histogram a line graph or a column graph?

3 The dot plot shows the shoe sizes of a sample of


Year 11 students.
a How many students in the sample? 6 7 8 9 10 11 12
Shoe size
b What is the most common shoe size for these
students?
c Find the outlier and describe the student that has this outlier.
d How many students had a shoe size of 10?
e What percentage of students had a shoe size over 8?

400 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


4 A sample of students was surveyed about the number of cars owned by each of their
families. The results are shown in the table.

Number of cars Frequency


0 4
1 16
2 11
3 0
4 1

a How many families did not own a car?


b What was the most common number of cars owned?
c What was the highest number of cars owned?
d How many students were surveyed?
e What was the total number of cars owned?

5 The masses (in kilograms) of 40 skydivers were recorded.


The results are shown below.

58  63  77 82  53 69  65 80 96 105


79  63  52 90 104 85  65 87 68 105
65  87 109 84  62 75 102 78 93  84
68 105  74 59  68 74  88 66 70  62

Copy and complete the frequency table below using the data about the mass of the
skydivers.

Mass (kg) Class centre Frequency

50 – < 60

60 – < 70

70 – < 80

80 – < 90

90 – < 100

100 – < 110

ISBN 9780170413565 10. Analysing data 401


WS
10.01  The mean, median and mode
Mean,
Homework
The mean, median and mode are three summary statistics that represent the centre or
median and
mode
average of a set of data. They are called the measures of central tendency (or measures of
location).
The mean (or average) has the symbol x , and is the sum of all scores divided by the number
of scores.
Statistical
Skillsheet
measures

The mean
sum of scores
Mean, x =
number of scores
∑x Σ means ‘the sum of’.
= x represents a score.
n n is the total number of scores.

If the scores in a data set are presented in a frequency distribution table then, by adding an
‘fx’ column, the mean can be calculated using the formula shown below.

Calculating the mean from a frequency table


sum of f x
Mean, x =
sum of f

∑ fx
=
∑f

The median and mode


When the scores are ordered from lowest to highest, the median is:
• the middle score, for an odd number of scores
• the average of the two middle scores, for an even number of scores.
The mode is the most common score or category. A set of data can have more than one
mode, or no mode at all.

402 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


EXAMPLE 1

For each data set below, find:


  i the mean (correct to one decimal place)
   ii the median
iii the mode.
a The maximum daily temperature (in °C) in Mudgee for the first two
weeks in January:
30 28 26 31 34 35 32 33 21 25 28 32 32 35

b A stem-and-leaf plot of the marks (out of 100) in a maths test for a class of students:

Stem Leaf
4 4 7 7 8
5 2 6 8 9 9
6 1 3 5 5 7 8
7 0 2 3 4 5
8 3 7 8
9 2 8

Solution

a i  Sum of scores (Σx) = 30 + 28 + 26 + 31 + 34 + 35 + 32 + 33 + 21 + 25 +


28 + 32 + 32 + 35
= 422
sum of scores
Mean, x =
number of scores
422
=
14
Note that the mean temperature
= 30.142 85… of 30.1°C is at the centre of all
14 temperatures.
≈ 30.1
ii  Placing the scores in order:

21 25 26 28 28 30 31 32 32 32 33 34 35 35
For 14 scores, the middle
31+ 32
Median = scores are the 7th and 8th
2 scores.
= 31.5

ISBN 9780170413565 10. Analysing data 403


iii  Mode = 32  he most common score
T Note that the mean (30.1),
(it occurred three times) median (31.5) and mode (32)
are all around the same central
value.

1671 The sum of the 25 marks is 1671


b i  Mean, x =
25
= 66.84

≈ 66.8
ii Stem Leaf
4 4 7 7 8
5 2 6 8 9 9
6 1 3 5 5 7 8
7 0 2 3 4 5 For 25 scores, the middle
score is the 13th score
8 3 7 8
9 2 8

Median = 65
iii  Mode = 47, 59 and 65

The statistics mode on a calculator


Scientific and graphics calculators have a statistics mode (SD or STAT). Follow the
WS instructions in the table below to calculate the mean of the temperatures from Example 1a
Homework
Statistics
using your calculator’s statistics mode.
mode:
graphics
calculator Operation Casio Scientific Sharp Scientific
Start statistics mode. MODE STAT 1-VAR MODE STAT =
Clear the statistical memory. SHIFT 1 Edit, Del-A 2ndF DEL

Enter data. SHIFT 1 Data to get table 30 M+ 28 M+ ,


etc.
30 = 28 = , etc. to enter in column
AC to leave table

Calculate the mean. SHIFT 1 Var  x = RCL x


( x = 30.142 85…)
Check the number of scores. SHIFT 1 Var n = RCL n
(n = 14)
Return to normal (COMP) MODE COMP MODE 0
mode.

404 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


Operation Casio Graphics Texas Instruments Graphics
Start statistics mode. MENU STAT for Lists table Y= and delete any function
by highlighting it and pressing
CLEAR

STAT EDIT

Clear the statistical memory. With cursor in List 1 column With cursor on L1
EXIT F6 DEL-A Yes CLEAR ENTER

Enter data. 30 EXE 28 EXE , etc. to enter 30 ENTER 28 ENTER , etc. to enter
in List 1 column in List 1 column
Calculate the mean. F6 CALC SET to make STAT CALC 1-Var Stats ENTER

( x = 30.142 85…) these settings (if different): to calculate many statistics
Calculate the sum of scores. 1Var XList: List 1 (scroll down for more)
(Σx = 422) 1Var Freq: 1
Check the number of scores. EXE
(n = 14) 1VAR to calculate many

statistics (scroll down for more)

The mean, median and mode from a frequency table

EXAMPLE 2
Statistics
from a
The scores for the players in a nine-hole golf Score (x) Frequency (  f  )
frequency
table
competition were sorted into the frequency table.
37 2
a How many players were there?
38 4
b For this data, find: 39 7
    i  the mean (correct to one decimal place) 40 4
   ii  the mode 41 1
iii  the median.

Solution
a 18 players Sum of f = 18

ISBN 9780170413565 10. Analysing data 405


b    i Score (x) Frequency (  f  ) fx fx means ‘f × x’
37 2 74 2 × 37 = 74
38 4 152 4 × 38 = 152, etc.
39 7 273
40 4 160 This means that there were
41 1 41 two scores of 37, four
scores of 38, etc. The ‘fx’
Totals ∑ f = 18 ∑ f x = 700 column groups equal scores
and adds them together.

Σfx 700
Mean, x = = The sum of all 18 scores = 700
Σf 18
= 38.888 8…
≈ 38.9
   ii Mode = 39 39 has the highest frequency, 7
iii To the table, add a cumulative frequency column which keeps a running total of
the frequencies.
Score (x) Frequency (  f  ) Cumulative frequency
37 2 2
38 4 6 2+4=6
39 7 13 6 + 7 = 13, etc.
40 4 17
41 1 18

Because there are 18 scores, the two middle scores are the 9th and 10th scores.
Reading from the cumulative frequency column, the 6th score is 38, and the 13th
score is a 39, so the 9th and 10th scores must both be 39.
39 + 39 Note that the mean (38.9), median
Median = = 39 (39) and mode (39) are all at the
2
centre of the data set.

Alternatively for part i, follow the instructions on the next page to use a calculator’s
statistics mode to calculate the mean of the golf scores.

406 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


Operation Casio Scientific Sharp Scientific
Start statistics mode. MODE STAT 1-VAR MODE STAT =
SHIFT MODE scroll down to STAT
Frequency? ON
Clear the statistical memory. SHIFT 1 Edit, Del-A 2ndF DEL

Enter data. SHIFT 1 Data to get table 37 2ndF STO

37 = 38 = , etc. to enter in 2 M+

x column 38 2ndF STO

2 = 4 = , etc. to enter in 4 M+
FREQ column
etc.
AC to leave table

Calculate the mean. SHIFT 1 Var  x = RCL x


( x = 38.888 8…)
Check the number of scores. SHIFT 1 Var n = RCL n
(n = 18)
Return to normal (COMP) MODE COMP MODE 0
mode.

Operation Casio Graphics Texas Instruments Graphics


Start statistics mode. MENU STAT for Lists table Y= and delete any function
by highlighting it and pressing
CLEAR

STAT EDIT

Clear the statistical With cursor in List 1 column With cursor on L1


memory. EXIT F6 DEL-A Yes CLEAR ENTER

Repeat for List 2 Repeat for List 2


Enter data. 37 EXE 38 EXE , etc. to enter 37 ENTER 38 ENTER , etc. to enter
in List 1 column in List 1 column
2 EXE 4 EXE , etc. to enter 2 ENTER 4 ENTER , etc. to enter
in List 2 column in List 2 column
Calculate the mean. F6 CALC SET to make STAT CALC 1-Var Stats

( x = 38.888 8…) these settings (if different): andtype ‘L1, L2’ by pressing
Calculate the sum of 1Var XList: List 1 2nd 1 ’ 2nd 2
scores.
1Var Freq: 1 ENTER to calculate many
(Σx = 700) statistics (scroll down for more)
EXE
Check the number of
1VAR to calculate many
scores.
statistics (scroll down for
(n = 18) more)

ISBN 9780170413565 10. Analysing data 407


The mean of grouped data
For data grouped into class intervals, an estimate of the mean can be calculated using the
class centres. It is only an estimate because, with class intervals, we do not know the exact
value of every score.

EXAMPLE 3

The ages of the patients at a medical centre in one afternoon Age Frequency
were recorded and grouped into this frequency table. 0–9 8
a Calculate, correct to one decimal place, the estimated 10–19 7
mean age of the patients. 20–29 6
b How many patients went to the medical centre? 30–39 8
40–49 5
Solution 50–59 4
60–69 3
a Age Class centre, x Frequency, f fx 70–79 1
0–9 4.5 8  36
10–19 14.5 7 101.5
20–29 24.5 6 147
30–39 34.5 8 276
40–49 44.5 5 222.5
50–59 54.5 4 218
60–69 64.5 3 193.5
70–79 74.5 1  74.5
Totals ∑ f = 42 ∑ f = 1269

Σfx
Estimate of the mean, x =
Σf
1269
=
42
= 30.214 2 …
Note that the estimated mean age of
≈ 30.2 30.2 is a central value of the data set.

b  
42 patients Σf = 42

408 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


The median class and modal class of grouped data

Median class and modal class


The median class is the class interval that contains the median score.
The modal class is the most common class interval(s).

EXAMPLE 4

The monthly call costs of a sample of mobile Call cost ($) Frequency Cumulative
phone users were grouped as shown in the frequency
cumulative frequency table on the right. 0– < 20 6 6
For this data, find: 20– < 40 8 14
a the median class 40– < 60 13 27

the modal class. 60– < 80 17 44


b
80– < 100 23 67
100– < 120 20 87
120– < 140 16 103
140– < 160 10 113
160– < 180 4 117
180– < 200 3 120

Solution

a There are 120 scores. The two middle scores are the 60th and 61st scores.
From the cumulative frequency column, the 60th and 61st scores are in the
80–< 100 class.
The median class is 80 – < 100.
b The modal class is 80 – < 100. This class has the highest frequency, 23.

ISBN 9780170413565 10. Analysing data 409


Comparing measures of central tendency
A measure of central tendency, such as the mean, median or mode, describes the centre
or average of a set of data. The following table summarises the three measures of central
tendency.

Measure of central tendency Features When it is most appropriate


Mean Depends on all scores in the When the data set does not have
sum of scores data set many outliers
x= Is affected by outliers
number of scores
Σx
x=
n
Σfx
x=
Σx
Median Not affected by outliers When the data set has many outliers,
Middle score or average of for example house prices, salaries
two middle scores
Mode Not affected by outliers When the most common score or
Most popular score(s) category is needed (for example dress
size); also useful for categorical data

EXAMPLE 5

Which measure of central tendency is most appropriate for describing each of the
following averages?
a the average price of a new car

b the most common number of bedrooms in a house

c a cricket player’s batting average

d average weekly income

Solution

a Median, because there would be many outliers (the prices of expensive cars).

b Mode, because the most frequent score is needed.

c Mean, because all scores are required in the calculation.

d Median, because there would be many outliers (the incomes of very rich people).

410 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


EXAMPLE 6

Ten houses were sold this week at Nelson Lakes for the following prices.

$376 000 $1 200 000 $270 000 $308 000 $372 000
$409 000 $387 000 $582 000 $460 000 $238 000

a Calculate the mean house price.


b Calculate the median house price.
c Which measure of central tendency is higher, the mean or the median?
d Which measure is more appropriate to describe the average house price?

Solution

a 4 602 000
Mean, x =
10
= $460 200

b Prices in order:

$238 000 $270 000 $308 000 $372 000 $376 000
$387 000 $409 000 $460 000 $582 000 $1 200 000

$376 000 + $387 000


Median =
2
= $381 500 Note that eight of the ten house prices
are below the mean ($460 200).
c The mean is higher.

d The median, because it is not distorted by the outlier of $1 200 000.

Exercise 10.01  The mean, median and mode


1 For each set of data below, find:
i  the mean ii  the median iii  the mode.
a  
1 1 2 5 5 7 9 10 Example

1
b 37 31 35 39 31 32 34 32 35 38
c 28 40 38 42 45 29 31 41 30
d  
5 8 14 9 10 7 11 15 8 7 5

ISBN 9780170413565 10. Analysing data 411


2 The stem-and-leaf plot on the right represents the number Stem Leaf
of points scored by the Sharks in every round of the
0 6 6
football season.
1 2 3 4 4 4 8 8 9
a How many rounds were played in the season?
2 0 0 0 5 6
b Calculate the mean score (correct to the nearest
3 0 0 2 4 4 6 7
whole number).
4 0
c Find the median number of points scored.
5
d What is the mode?
6 2

3 Ngaire is training for a triathlon. She swam the following times, in minutes,
in her last 10 races.
28 34 22 24 25 24 26 26 24 27

a Which of the following is Ngaire’s mean swim time? Select A, B, C or D.


A 24 B 25 C 25.5 D 26
b Which of the following was her median swim time in minutes?
Select A, B, C or D.
A 24 B 25 C 25.5 D 26
c Which of the following was Ngaire’s modal swim time for the 10 races?
Select A, B, C or D.
A 24 B 25 C 25.5 D 26

4 ‘Average contents 50’ is printed on each box


Example
Number of matches (x) Frequency (  f  )
2 of Meg’s Matches. A quality controller
48 10
counted the contents of a sample of
160 matchboxes from the production line and 49 45
tabulated the results, as shown on the right. 50 52
a Use an ‘fx’ column, or your calculator’s 51 39
statistics mode, to calculate the mean 52  9
number of matches per box, correct 53  5
to one decimal place.
b Is the claim ‘Average contents 50’ justified? Give a reason for your answer.
c Find the mode.
d Find the median.

412 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


5 This dot plot shows the number of children in
each family living on Willard Crescent.
a How many families live on Willard Crescent?
b Use a frequency table, or your calculator’s
statistics mode, to calculate the mean number 0 1 2 3 4 5 6 7 8
of children per family. Number of children per family
c What is the median?
d What is the mode?
e What is the outlier?
f If the outlier is removed from the data set, how does this affect:
i the mean? ii  the median? iii  the mode?

6 This frequency histogram shows the number of Elena’s mobile calls


mobile phone calls made by Elena each day over a
5
number of days.
4

Frequency
a Draw a frequency table for thus data, including
3
an ‘fx’ column.
2
b Over how many days was the number of calls
1
Elena made recorded?
0
c Find the mode of this data. 2 3 4 5 6 7
d Find the median of this data. Number of calls per day

e Calculate the mean number of phone calls made by


Elena per day, correct to one decimal place.

7 The police used radar to check the speeds of Speed (km/h) Number of cars, f Example

motor vehicles driving in a 40 km/h zone outside a 36 – 40 64 3


local primary school one morning. They recorded
41– 45 36
the results in the table on the right.
46 – 50 18
a Add a column of class centres to the table and
calculate an estimate for the mean speed of 51– 55 15
the vehicles, correct to two decimal places. 56 – 60 11
b How many motor vehicles had their speeds 61– 65  5
checked?

8 The heights of young trees in a section of nursery Height (cm) Number of trees Example
were measured before planting. The results are 4
20 – 29 28
shown in the table on the right.
30 – 39 45
For this data, find:
40 – 49 74
a the median class
50 – 59 63
b the modal class.
60 – 69 24

ISBN 9780170413565 10. Analysing data 413


9 This dot plot shows the minimum daily
temperatures (in °C) in Camden over a
3-week period.
a What is the mode? –2 –1 0 1 2 3 4 5 6 7 8
Minimum daily temperatures (°C)
b What is the median?
c Calculate the mean, correct to one decimal place.

10 The weekly wages of the staff at Yen’s restaurant Wage ($) Number of
are shown in the frequency table. employees
a What is the modal class for the wages? 100– < 200  5
b What is the median class? 200– < 300 11
300– < 400 20
400– < 500  4
500– < 600  3
600– < 700  1

11 Decide which M (mean, median or mode) is correct for each of the following.
a This M takes all scores in the data set into account.
b This M is one of the scores if there is an odd number of scores.
c Half of the scores are above this M, the other half are below.
d There can be more than one M in a set of data.
e This M often needs to be rounded to decimal places.
f This M can also be used for categorical data.
g This M can be distorted by many outliers.
h This M must be one of the scores in the data set.

Example
12 Which measure of central tendency is most appropriate for describing each average?
5 a the average exam mark for the class
b the average shirt size for teenage girls
c the average rent paid for a house in Sydney
d the average screen size of a notebook computer
e the average mass of football players in a team
f the average brand of mobile phone

414 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


13 A small business employs staff with the following salaries: Example

• general manager $158 300 6


• three factory hands $64 300 each
• supervisor $85 600
• two clerical officers $68 500 each
a How many people are there on staff?
b Calculate the mean salary of the staff, correct to the nearest $100.
c Calculate the median salary of the staff.
d Which measure of central tendency is higher, mean or median? Why?
e Which measure of central tendency best describes the average salary at this
business?

14 The ages of the maths teachers at Westvale Christian College are:


49 32 37 32 25 41 39 50

a For this data, find:


i  the mean ii  the median iii  the mode.
b The 39-year-old teacher is replaced by a new teacher, aged 22. Describe how this
will affect:
i  the mean ii  the median iii  the mode.

15 The colours of the new cars sold last week at Huxley Motors were recorded. The results
are shown in the table below.
Colour Black Blue Red Silver White
Frequency 4 7 7 9 12

a How many new cars were sold?


b What is the mode for this data?
c Why is the mode the only valid measure of central tendency here?

16 The weekly mortgage repayments (in dollars) of 11 home owners are:


370 628 299 417 354 1027 585 435 509 652 481

a For this data, find:


i the mean, correct to the nearest dollar
ii the median
iii the mode.
b Why isn’t the mean or mode an appropriate measure of central tendency for this set
of data?
c If the outlier is removed from the data, check whether the new mean will be closer
to the new median than the mean was to the median for the original set of data.

ISBN 9780170413565 10. Analysing data 415


17 The dot plot on the right shows the shoe sizes of a sample
of Year 11 students.
a For this data, find:
6 7 8 9 10 11 12
i  the mean ii  the median iii  the mode.
Shoe size
b If the outlier is removed, state what will happen to:
i the mean? ii the mode?
c A shoe store needs to buy more shoes for a back-to-school sale. Which measure of
central tendency is most appropriate for the store to use in this situation?

18 The stem-and-leaf plot on the right shows the Stem Leaf


maximum daily temperatures (in °C) in Port
2 2 4 4 5 6 6 7 7 7 8 8 9
Macquarie for the last two weeks in December.
3 1 4
Source: © Copyright Commonwealth of Australia
2017, Bureau of Meteorology
a For this data, find:
i the mean ii the median iii the mode.
b Which measure of central tendency is the most appropriate for describing the
average maximum daily temperature?

TECHNOLOGY
Calculating measures of central tendency
Step 1: Open a blank spreadsheet to enter the following temperature data about
Mudgee from Example 1 on page 403.

Step 2: In cell E5, enter the formula =AVERAGE(A2:G3) to calculate the mean
(30.142 85…).
Step 3: In cell E6, enter the formula =MEDIAN(A2:G3) to calculate the median
(31.5).
Step 4:  In cell E7, enter the formula =MODE(A2:G3) to calculate the mode (32).

If there is more than one mode in a data set, the


spreadsheet displays only one of the modes.

416 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


10.02  Quartiles, deciles and percentiles
Quantiles are points of a distribution or data set that separate the data into equal groups
after the data has been sorted into order. Commonly used quantiles are quartiles, deciles
and percentiles.

The median and quartiles

Quartiles
The three quartiles of a data set are those values that separate the data into quarters.
• The lower quartile, Q1 or QL, separates the bottom quarter (25%) of scores from
the rest of the scores.
• The upper quartile, Q3 or QU, separates the top quarter (25%) of scores from the
rest of the scores.
• The middle quartile, Q2, is the median, and separates the two middle quarters.

These speeds (in km/h) were recorded for 11 cars driving along a major country road:

104 86 95 100 81 120 84 78 93 92 107


When we sort the scores, in ascending order, we can find the quartiles:
A speed of 81 km/h is in the A speed of 100 km/h is in the A speed of 107 km/h is in
bottom quarter of scores. 2nd top quarter of scores. the top quarter of scores.

78 81 84 86 92 93 95 100 104 107 120

Q1 = 84 Q2 = 93 Q3 = 104

Quartiles of a data set


To find the quartiles of a data set:
Step 1:  Sort the scores in order, find the median and call it Q2.
Step 2:  Find the median of the bottom half of scores and call it Q1.
Step 3:  Find the median of the top half of scores and call it Q3.

ISBN 9780170413565 10. Analysing data 417


EXAMPLE 7

Find the quartiles for each data set below.


a The marks obtained by a class of students for an art project are:
51 41 60 38 46 57 39 61 43 64

b The scores obtained by a golfer for the first nine holes of a golf course are:
4 3 5 6 4 3 8 6 6

Solution

a First, sort the marks and place them in order:


38 39 41 43 46 51 57 60 61 64

46 + 51
Q2 = — = 48.5
2
Q1 = 41 Q3 = 60

b 3 3 4 4 5 6 6 6 8

3+4 6+6
Q1 = — = 3.5 Q3 = — = 6
2 2
Q2 = 5

Deciles
Quartiles (Q1, Q2 and Q3) separate data into quarters.
Deciles (D1, D2, D3, D4, D5, D6, D7, D8 and D9) separate data into tenths. Deci- means
‘one tenth’.
For example:
• D1 cuts off the lowest 10% of scores.
• D4 cuts off the lowest 40% of scores.
• D9 cuts off the lowest 90% of scores (or the top 10% of scores).

418 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


EXAMPLE 8

The lengths (in centimetres) of 20 newborn infants at a hospital were recorded:


51 49 52 49 47 56 48 48 52 50
55 49 48 51 44 52 50 50 53 45

a What is the 3rd decile for this data?


b What is the 5th decile for this data?
c What is another name for the 5th decile?
d Find the value that separates the bottom 70% of lengths from the top 30%.
e If the length of newborn baby James is in the top 10% of infant lengths, what value
must it be greater than?

Solution
Place the values in order first:
D1 D2 D3 D4 D5

44 45 47 48 48 48 49 49 49 50
50 50 51 51 52 52 52 53 55 56

D6 D7 D8 D9

48 + 49
a D3 = = 48.5
2
50 + 50
b D5 = = 50
2
c The median, because it cuts off the lowest 50% of scores.
51 + 52
d D7 = = 51.5
2
53 + 55
e James’ length must be greater than D9 = = 54.
2

Percentiles
Percentiles (P1, P2, P3, ... P99) separate data into hundredths.
For example:
• P24 cuts off the lowest 24% of scores
• P60 cuts off the lowest 60% of scores
• P87 cuts off the lowest 87% of scores (or the top 13% of scores).
Deciles and percentiles are only meaningful when analysing large sets of data.

ISBN 9780170413565 10. Analysing data 419


EXAMPLE 9

The following information is based on population data for the heights of girls aged
16 years.
• The median is 163 cm.
• The 3rd quartile Q3 = 167 cm.
• The 9th decile D9 = 171 cm.
• The 5th percentile P5 = 152 cm.
• The 97th percentile P97 = 175 cm.
In the following questions, all of the girls mentioned are aged 16.
a Holly’s height is 175 cm. Is she tall for her age and what percentage of 16-year-old
girls are taller than her?
b Olga is taller than 90% of girls her age. What is her height?
1
c If of girls her age are taller than Verity, how tall is she?
4
d What height separates the bottom 5% of heights from the top 95%?
e What percentile is a height of 163 cm?

Solution

a Yes, P97 = 175 cm, which means Holly is taller than 97% of girls her age. So only
3% of girls aged 16 are taller than her.
b Olga’s height = P90 = D9 = 171 cm.
3
c Verity is taller than of girls her age, so her height is P75 = Q3 = 167 cm.
4
d P5 = 152 cm

e 163 cm is the median, so it is also the 50th percentile P50 (the height that cuts off
the lowest 50% of scores). The median is the 2nd quartile Q , the 5th
2
decile D5 and the 50th percentile P50.

Exercise 10.02  Quartiles, deciles and percentiles

Example
1 Find the quartiles Q1, Q2 and Q3 for each data set below.
7 a The times, in seconds, to run 100 metres:
8.7 9.1 11.0 13.5 10.6 8.9 10.1 9.6
12.3 9.9 9.0 10.8 9.2 13.1 10.6

420 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


b The number of matches in a box:
49 50 52 48 50 51 49 50 52 51 50 50

c The prices, in dollars, of a bag of potatoes:


3.50 3.20 3.50 4.10 3.00 3.50 3.90 2.80 3.40 3.00

d The weekly rainfall, in millimetres, over three months:


16 24 18 26 21 27 5
7 17 21 9 0 22

2 The stem-and-leaf plot on the right shows the game Stem Leaf
scores of a group of ten-pin bowlers. 8 2 7 8
For this data, find: 9 0 3 4 6 9
a the median 10 4 4 5 8 8 8

b the lower quartile 11 2 3 4 6 7 9 9

c the upper quartile. 12 0 0 5 6 6 8


13 1 1 4 7 9

3 The dot plot on the right shows the number of


vehicles driving past Westvale High School per
minute in a 20-minute period.

Which of the following is the upper quartile Q3?


Select A, B, C or D. 2 3 4 5 6 7 8 9 10
A 7.5 B 7 Number of vehicles per minute

C 8.5 D 8
4 The percentage scores of a class of 30 students in a science test are shown below. Example

61 75 46 78 81 95 67 61 50 74 8
100 57 83 64 69 95 85 89 66 45
71 87 84 80 63 92 64 75 97 60
a What is the 8th decile?
b What is the 3rd decile?
c What is the 40th percentile?
d Find the value that cuts off the lower 20% of scores from the upper 80%?
e What percentage of students scored higher than 79?
5 For the data shown in the dot plot in Question 3, find:
a the 1st decile
b the 5th decile
c the value that cuts off the lower 70% of scores
d the value that cuts off the top 60% of scores
e the 90th percentile.

ISBN 9780170413565 10. Analysing data 421


Example 6 The following information is based on population data, for the body mass indices
9 (BMI kg/m2) of boys aged 16 years.
• The 1st quartile Q1 = 18.8.
• The 1st decile D1 = 17.6.
• The 9th decile D9 = 25.4.
• The 50th percentile P50 = 20.6.
• The 97th percentile P97 = 29.4.
In the questions below, all of the boys mentioned are aged 16.
a Sanjay has the median BMI for boys his age. What is his BMI?
b Michael has a BMI of 18.8. Is this high for his age? What percentage of boys aged
16 have a BMI lower than him?
c 10% of boys aged 16 have a higher BMI than Harley. What is his BMI?
d Adrian has a BMI of 29.4. Is this high for his age? What percentage of boys aged 16
have a BMI higher than him?
e What percentile is a BMI of 17.6?

7 The information below is based on weather records kept by the Bureau of Meteorology
for the maximum daily temperatures in November for Newcastle.
•  The mean is 23.5°C.
•  The highest temperature on record was 41.0°C (on 19 November 1968).
•  The lowest temperature on record was 15.6°C (on 19 November 1986).
•  The 1st decile D1 = 18.9°C.
•  The 9th decile D9 = 28.6°C.
© Copyright Commonwealth of Australia 2017, Bureau of Meteorology

a What is the range of temperatures?


b What percentage of temperatures were higher than 28.6°C?
c To what value would you expect the median to be close (but not necessarily equal)?
d What value is higher than 10% of all temperatures recorded?
e What is the size of the 9th decile band (the difference between the highest
temperature and the 9th decile)?

8 True or false?
a P75 = Q1 b P60 = D6 c P50 = Median
d Q3 = P75 e D8 = P20 f Q2 = D5

422 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


9 This table was published by the University Admissions Centre (UAC) giving the
percentiles of different Australian Tertiary Admission Rank (ATAR) for the 2015 HSC.

Percentile 40 50 60 70 80 85 90 95 99 100
ATAR 61.65 68.65 75.25 81.60 87.85 90.90 93.95 96.95 99.40 99.95
© 2016 Universities Admissions Centre (NSW & ACT)

a What percentage of HSC students scored an ATAR:


i below 75.25?
ii above 61.65?
iii
between 93.95 and 99.40?
b What percentage of students scored an ATAR above the 90th percentile?
c Only 9.1% of students scored an ATAR of above what value?
d What is the median ATAR for the 2016 HSC?
e What is the percentile of an ATAR of 81.6?

10 This table shows the percentiles for the heights (in cm) of girls aged 2 to 5 years,
according to the child growth standards of the World Health Organization (WHO).

Age (years) P5 P25 P50 P75 P85 P99


2 80.4 83.5 85.7 87.9 89.1 93.2
2.5 84.9 88.3 90.7 93.1 94.3 98.9
3 88.8 92.5 95.1 97.6 99.0 103.9
3.5 92.4 96.3 99.0 101.8 103.3 108.5
4 95.6 99.8 102.7 105.6 107.2 112.8
4.5 98.7 103.1 106.2 109.2 110.9 116.7
5 101.6 106.2 109.4 112.6 114.4 120.5
© WHO 2017

a What is the median height of a 4-year-old girl?


b Libby is aged 2.5 and is 88.3 cm tall. Is she tall for her age? What percentage of
girls her age are shorter than her?
c What is Libby’s expected height when she turns 5 years old?
d Only 15% of girls Renee’s age are taller than her. How tall is she if she is 3.5 years
old?
e Mikayla is 2 years old and 93.2 cm tall. Is she short for her age? What percentage of
girls her age are taller than her?
f Mia is aged 3 and her height is at the 3rd quartile. What is her height now and in
18 months time?

ISBN 9780170413565 10. Analysing data 423


11 This stature-for-age percentiles chart shows the range of heights for boys aged
2 to 20 years.
190 97th
Stature-for-age percentiles: Boys, 2 to 20 years 95th
90th
185
75th
180
50th
175
25th
170
10th
165 5th
3rd
160

155

150

145

140
Height (cm)

135

130

125

120

115

110

105

100

95

90

85

80

75
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Age (years)
Source: Developed by the National Center for Health Statistics in collaboration with the National Center for Chronic Disease Prevention and
Health Promotion (2000) http://www.cdc.gov/growthcharts

a Adam is aged 9 and 129 cm tall. What percentage of boys his age are shorter than him?
b Justin is 11 years old and 155 cm tall. What percentage of boys his age are shorter
than him?
c How tall should Justin be when he turns 18?
d Liong is 103 cm tall, which is at the 1st decile for boys his age. How old is Liong?
e Asam is 16 and his height is at the 3rd quartile.
i What is Asam’s height now?
ii What will Asam’s height be when he turns 20 years old?

424 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


DID YOU KNOW?

Healthy growth charts for children


In 2006, the World Health Organization (WHO) started publishing growth charts
based on good health standards rather than the general population. They selected 8440
children who grew up in optimal healthy environments, from six countries: Brazil,
Ghana, India, Norway, Oman and USA. These children were chosen because they were
well-fed, breastfed as infants, not obese, their mothers did not smoke, and they had access
to good health care where infections were controlled and prevented.
Selecting children from six different countries to represent the world’s children
is an example of stratified sampling. Why do you think the WHO chose those
particular countries?

TECHNOLOGY
Calculating quartiles and percentiles
A spreadsheet can be used to calculate the quartiles, deciles and percentiles
of a set of data.
Step 1: Open a blank spreadsheet to enter data in rows 2 and 3 as shown using the
infant lengths from Example 8 on page 419.

Step 2: In cell F5, enter the formula = QUARTILE(B2:K3,1) to calculate Q1 = 48.
Step 3: In cell F6, enter = QUARTILE(B2:K3,3) to calculate Q3 = 52.
Step 4: In cell F7, enter = PERCENTILE(B2:K3,0.2) to calculate D2 = 48.
Step 5: In cell F8, enter = PERCENTILE(B2:K3,0.7) to calculate D7 = 51.3.
Step 6: In cell F9, enter = PERCENTILE(B2:K3,0.32) to calculate P32 = 49.
Step 7: In cell F10, enter = PERCENTILE(B2:K3,0.95) to calculate P95 = 55.05.

ISBN 9780170413565 10. Analysing data 425


WS
10.03  The range and interquartile range
Interquartile
Homework
While the mean, median and mode describe the centre of a data set, there are three summary
range statistics that describe the spread of data: the range, the interquartile range and the
standard deviation. These are called measures of spread.

Range and interquartile range


Interquartile Range = highest score − lowest score
range

Interquartile range (IQR) = upper quartile − lower quartile = Q3 – Q1

Interquartile Standard deviation will be explained later in this chapter.


range

PS
EXAMPLE 10
Statistical
match-up
For each data set below, find:
i  the range ii  the interquartile range.
a The maximum daily temperature (in °C) in Mudgee for the first two weeks in
January:

30 28 26 31 34 35 32 33 21 25 28 32 32 35

b The body temperatures (in °C) of a sample of hospital patients, as shown in the dot
plot on the right.

36 37 38 39 40 41 42 °C
Patients’ temperatures

Solution

a    i Range = 35 – 21 = 14
ii  Placing the scores in order:
21 25 26 28 28 30 31 32 32 32 33 34 35 35

Q1 = 28 Q2 Q3 = 33

IQR = Q3 – Q1
= 33 – 28
=5

426 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


b    i Range = 42 − 36 = 6
– Q3

Q1 Of 9 scores, the median, Q2, is


Q2 the 5th score, counting upwards

from the left.

36 37 38 39 40 41 42 °C
Patients’ temperatures

37 + 37 38 + 39
ii  Q1 = = 37, Q3 = = 38.5
2 2
IQR = Q3 − Q1
= 38.5 – 37
= 1.5

The range represents the total spread of scores but it is not a good measure if there are
outliers. The interquartile range is not affected by outliers, because it measures the range of
the middle two quarters only.
Range
Interquartile
range
25% 50% 25%

Lower Median, Upper


quartile, Q1 Q2 quartile, Q3

Exercise 10.03  The range and interquartile range


1 Calculate the range of each data set. Example

a Number of accidents per month in a factory: 10


3 0 0 1 2 1 6 0 0 2 1 0

b A golfer’s scores for the first nine holes of a golf course:


4 3 5 6 4 3 8 6 6

c Weekly mortgage repayments, in dollars:


370 628 299 417 354 1027 585 435 509 652 481

d Times, in minutes, for the swim-leg of a triathlon:


28 34 22 24 25 24 26 26 24 27

2 Calculate the interquartile range of each data set in Question 1.

ISBN 9780170413565 10. Analysing data 427


3 The dot plot on the right shows the number of
vehicles driving past Westvale High School
per minute in a 20-minute period.

Which of the following is the interquartile range of


this data set? Select A, B, C or D. 2 3 4 5 6 7 8 9 10
Number of vehicles per minute

A 2.5 B 3 C 5 D 8

4 This stem-and-leaf plot on the right shows the marks out


Stem Leaf
of 100 for a class of students in a maths test.
3 0 7
For this data, find:
4 2 3 4 6 8
a the range
5 0 1 4 5 7 7
b the interquartile range. 6 2 3 5 7 8 8
7 4 5 6 9
8 2 2 7
9 3

5 Fifteen job applicants took a short general knowledge multiple-choice quiz. Their times
(in seconds) to complete this test were:
45 37 46 34 26 15 35 61
43 48 52 38 30 44 37

a What was the range of times?


b What was the interquartile range?
c Give a possible reason for the outlier.

6 This stem-and-leaf plot below represents the number of points Stem Leaf
per match scored by the GWS Giants in a football season. 7 8 9
8 3 5 8 9
9 1 2 3 5 8
10 0 5
11 1 7
12 6 7 9
13
Getty Images/Matt King

14 6 9
15 1 8

Which of the following is the interquartile range of this data set? Select A, B, C or D.
A 80 B 38 C 44 D 42

7 Calculate the range of the data set from Question 6.

428 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


10.04  The effect of outliers
An outlier is a very high or very low score in a data set that is clearly apart from the other
scores. It can occur for a variety of reasons and should be investigated. If it was obtained
through incorrect measurement, it should be excluded.

Outliers
An outlier is a score that is either: This is only one of many ways of
determining whether a score is an outlier.
• less than Q1 − 1.5 × IQR or
• greater than Q3 + 1.5 × IQR
where Q1 (or QL) is the lower quartile, Q3 (or QU) is the upper quartile, and IQR is the
interquartile range.

EXAMPLE 11

The following scores are marks achieved by students in a test. Outliers

11  8 12 12 15 13 10 25 12 11  7
10 13 16 10 12 16 11 12 16 17 20
Test which scores are outliers.

Solution

The scores arranged in order are:

7 8 10 10 10 11 11 11 12 12 12 12 12 13 13 15 16 16 16 17 20 25

Q1 Q2 Q3

IQR = Q3 − Q1
= 16 − 11
=5
∴ 1.5 × IQR = 1.5 × 5 Q3 + 1.5 × IQR = 16 + 7.5

= 7.5 = 23.5

∴ Q1 − 1.5 × IQR = 11 − 7.5 ∴ A score is an outlier if it is less than 3.5


or greater than 23.5.
= 3.5
∴ 25 is an outlier.

ISBN 9780170413565 10. Analysing data 429


Outliers and measures of central tendency
Outliers can affect the measures of central tendency of a data set.
• The mean is most affected by outliers (because its value depends on every score).
• The median can be affected, but not by much.
• The mode is not affected at all.

EXAMPLE 12

The dot plot shows the temperatures of patients in a


hospital ward.
a Calculate the mean, mode and median of this
data set.
36 37 38 39 40 41 42 43 °C
b What is the outlier temperature? Temperature

c Calculate the mean, mode and median of this data


set if the outlier is excluded.
d Describe the effect the outlier has on the measures of central tendency of the
distribution.

Solution
535
a Mean = ≈ 38.2
14
Mode = 39
38 + 38
Median = = 38        The average of the 7th and 8th scores
2
b From the dot plot, Q1 = 37, Q3 = 39
IQR = Q3 − Q1 Q1 Q3

= 39 − 37
=2 36 37 38 39 40 41 42 43 °C
Temperature
1.5 × IQR = 1.5 × 2
=3
If 43 is an outlier, it must be greater than Q3 + 1.5 × IQR.
∴ Q3 + 1.5 × IQR = 39 + 3
= 42
∴ the outlier is 43.

430 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


c 492
Mean = ≈ 37.8
13
Mode = 39
Median = 38
d The high outlier does not affect the mode and median but it increases the mean.

Exercise 10.04  The effect of outliers


1 The following scores are the number of goals scored by a hockey team during a season. Example

3 2 0 0 1 2 3 2 4 8 11
2 3 5 2 1 3 4 4 2 3
a Find the interquartile range.
b Find the value of:
i Q1 − 1.5 × IQR ii Q3 + 1.5 × IQR
c Is the score of 8 goals an outlier? Give reasons.

2 Determine whether each data set has outliers.


a 2 5 6 6 7 8 10 10 15
b  
9 13 13 14 14 15 15 15 15
16 16 16 16 16 17 17 18
c Stem Leaf d Score (x) Frequency ( f )
1 2 9 4 3
2 0 3 4 4 8 5 12
3 4 5 6 7 6 4
4 1 4 9 7 3
5 0 2 8 0
6 8 9 1

3 The employees at the Bread and Butter Cafe earned the following wages in a week. Example

$450  $520  $610  $230  $900  $420  $590 12
a What is the mean wage?
b What is the median wage?
c Find the interquartile range.
d The manager’s wage is an outlier. What is this wage and how do we verify that it is
an outlier?
e If the manager’s wage is not included, how does this affect the mean and median
wage?
f If each employee receives a 10% pay rise, what will be the new mean and median
wage? Is it 10% more than the old mean and median?

ISBN 9780170413565 10. Analysing data 431


4 The cups of coffee drunk by a sample of HSC Cups of coffee No. of markers
exam markers in one night is shown in the table.
2 1
a How many markers were surveyed? 3 4
b What is the outlier? 4 5
c What is the mean if the outlier: 5 9
i is included?    ii  is not included?
6 0
d If the outlier is included, what effect does this
7 0
have on the mean number of cups of coffee
8 1
that were drunk?

5 A group of friends goes to the cinema. The ages of the group are
13 12 11 14 12 15 14 13.

If Kait brings her 5-year-old sister as well, what will happen? Select A, B, C or D.
A The median age increases. B The median age decreases.
C The mean age increases. D The mean age decreases.

6 In a netball tournament of five matches, the points scored by three teams are:

The Wombats 24 18 14 6 22
The Possums 16 16 15 18 15
The Koalas 36 8 14 16 12

a What are the mean and median scores for each team?
b Which team is the most consistent? Why?
c An error was made in the scoring for the Wombats – the score of 6 should have
been 16. What are the new mean and median?
d Which team is most consistent now? Why?

7 Sam and Terri sell copiers. The numbers of copiers that they sell each week are sorted in
ascending order.
Sam 1 2 3 3 5 6 7 8 12 25
Terri 3 3 3 14 16 18 18 24 32 35

a What is the modal number of copiers sold by each person?


b What could you say about each person if you only knew the mode?
c What is the median number of copiers sold by each person?
d What is the mean number of copiers sold by each person?
e Which measure of central tendency, mean, median or mode, is best for comparing
their sales performances?
f Who is the better salesperson? Justify your answer.

432 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


8 This dot plot represents the number of accidents at a factory each month over a year.

0 1 2 3 4 5 6 7 8 9
Accidents/month

a Calculate the mean, mode and median of this data set.


b What is the outlier number of accidents? Explain why.
c Calculate the mean, mode and median of this data set if the outlier is excluded.
d Describe the effect the outlier has on the three measures of central tendency.

9 Rupert’s bookstore employs the following people with annual wages as shown.
1 store manager $73 800
2 cashiers $34 200 each
2 part-time clerical staff $28 500 each
3 salespeople $46 500 each
2 part-time cleaners $13 500 each

a Find the mean, median and modal annual salary for the 10 employees.
b Which measure of central tendency would Rupert use to make the salaries appear
higher? Why?
c Which measure best represents the average wage for an employee at Rupert’s
bookstore? Why?

DID YOU KNOW?

The Challenger space shuttle disaster


In January 1986, an engineer working on the space shuttle program at NASA predicted
that at low air temperatures, the potential for damage to the shuttle would be extremely
high. For a temperature of 12°C, he calculated a damage index of 11. He compared this
to data from previous flights (as shown in the table below) and recommended that the
Challenger flight be delayed due to the low air temperature on the day.

Year Data from previous flights 1986


Air temperature (°C) 26 14 19 23 12
Damage index 0 4 0 0 11

However, his advice was ignored and the outlier was not considered important enough
to delay the flight. The Challenger exploded just after takeoff, killing all seven astronauts.
Later it was found that two rubber O-rings had failed to seal a joint at low temperatures,
causing the shuttle to disintegrate.
Give another example of when an outlier should not be ignored.

ISBN 9780170413565 10. Analysing data 433


WS
10.05  Cumulative frequency graphs
Cumulative
Homework
A cumulative frequency histogram is a column graph of cumulative frequency.
frequency
graphs A cumulative frequency polygon, also called an ogive (pronounced ‘oh-jive’) is drawn by
joining the top right-hand corner of each column of a cumulative frequency histogram.

EXAMPLE 13

The maximum daily temperatures (in °C) in Campbelltown in June were recorded and
grouped into the frequency table.

Temperature (°C) Frequency Cumulative frequency


12 1  1
13 2  3
14 6  9
15 2 11
16 6 17
17 3 20
18 6 26
19 1 27
20 2 29
21 1 30

a Draw a cumulative frequency histogram and polygon for the data.


b Use the frequency polygon to find the median and calculate the interquartile range.

Solution

a June temperatures in Campbelltown

30 ogive

27 The ogive (polygon)


Q3 = 18 is contained inside
24
the columns.
Cumulative frequency

21
18
median = 16
15
12
9 Q1 = 14
6
3

0
12 13 14 15 16 17 18 19 20 21
Temperature (°C)

434 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


b Draw a horizontal line from the halfway mark (15) on the cumulative frequency
axis to where it meets the ogive. The median is the corresponding value on the
‘Temperature’ axis.
Median = 16
1 
To find Q1, draw a horizontal line from the quarter mark  × 30 = 7.5 on the
4
cumulative frequency axis to where it meets the ogive, then read the temperature value.
Q1 = 14
3 
To find Q3, draw a horizontal line from the three-quarter mark  × 30 = 22.5
4
on the cumulative frequency axis.
Q3 = 18
Interquartile range = Q3 – Q1
= 18 − 14
=4

EXAMPLE 14

a Use the cumulative frequency graph from Example 13 to find:


i  the 4th decile, D4 ii  the 7th decile, D7.
b What value cuts off the top 20% of temperatures?
c Between which two deciles would you find a temperature of 14°C?

Solution

a The deciles are marked at intervals of June temperatures in Campbelltown


3 units on the cumulative frequency
30
axis.
27
i  D4 = 16      ii  D7 = 18 D8 = 18
24
Cumulative frequency

D7 = 18
21
b D8 cuts off the top 20% of 18
temperatures, so the value is 18. 15
D5 = 16
D4 = 16
c Between D1 and D3. 12
D3 = 14.5
9
6
D1 = 13.5
3

0
12 13 14 15 16 17 18 19 20 21
Temperature (°C)

ISBN 9780170413565 10. Analysing data 435


EXAMPLE 15

The number of cases of ovarian cancer in women from various age groups is shown below.

Age (years) Class centre Frequency Cumulative frequency


35 – < 45 40 28  28
45 – < 55 50 61  89
55 – < 65 60 65 154
65 – < 75 70 92 246
75 – < 85 80 74 320

Draw an ogive for this data and use it to find an estimate for:
a  the median b  the 3rd quartile
c  the 9th decile d  the interquartile range.

Solution
Cases of ovarian cancer

320
D9 = 80
280
Q3 = 74
240
Cumulative frequency

200
Median = 66
160

120
Q1 = 53
80

40

0
35 45 55 65 75 85
Age (years)

All these values are


a Halfway point on the ‘Cumulative frequency’ axis = 160 estimates because the
data has been grouped
Median ≈ 66    Estimating from the ‘Age’ axis into class intervals.

436 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


b The three-quarter point on the ‘Cumulative frequency’
3
axis = × 320 = 240
4
Q3 = 74

c 90% point on the ‘Cumulative frequency’ axis = 0.9 × 320


= 288
D9 ≈ 80
1
d Quarter point on the ‘Cumulative frequency’ axis = × 320
4
= 80
Q1 = 53
Interquartile range = Q3 – Q1 = 74−53
= 21

Exercise 10.05  Cumulative frequency graphs


1 A sample of households was TVs owned Frequency Cumulative frequency Example

surveyed on the number of TVs


1 1 13
owned.
2 7
a Copy the table and complete 3 9
the cumulative frequency 4 6
column to find the median.
5 0
b Construct a cumulative
6 1
frequency histogram and
polygon.
c Use the graphs you drew in part b to find:
   i the median
ii the interquartile range.

2 This ogive shows the speeds of motor vehicles Speed of motor vehicles on main street
travelling along the main street of a town.
Cumulative frequency

Example
25
a How many vehicles were in the survey? 14
20
b Estimate the median speed of the vehicles
15
c Estimate the interquartile range. 10
d Estimate the 9th decile. 5

0
10 20 30 40 50 60 70 80
Speed (km/h)

ISBN 9780170413565 10. Analysing data 437


3 A packet of jelly beans is labelled ‘Contents 30’ but a quality control check found the
results shown in the table.

Number of jelly beans Frequency Cumulative frequency


28  6
29 34
30 56
31 28
32  5
33  1

a Copy the table and complete the cumulative frequency column.


b Construct an ogive and use it to find an estimate of:
  i the median
   ii the interquartile range
iii the 4th decile.

Example
4 The heights of 50 students were measured and grouped into class intervals.
15 Height (cm) Class centre Frequency Cumulative frequency
134 – < 141  2
141 – < 148  3
148 – < 155  4
155 – < 162 13
162 – < 169 15
169 – < 176 11
176 – < 183 2

a Copy and complete the table.


b What is the modal class?
c What is the median class?
d Construct an ogive and use it to estimate:
i the median ii the interquartile range iii the 7th decile.

438 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


10.06  Box plots
A box plot (or box-and-whisker plot) displays the quartiles of a set of data and the lowest Box-and-
and highest scores. The ‘box’ represents the middle 50% of scores and the interquartile whisker plots

range, while the ‘whiskers’ represent the lowest and highest 25% of scores.
WS
A box plot gives a five-number summary of a data set:
interquartile whisker
• the lower extreme (lowest score) range Box plots:
Homework
box graphics
calculator

• the lower quartile, Q1


lower Q1 Q3 upper
• the median, Q2 extreme extreme
median
• the upper quartile, Q3
• the upper extreme (highest score).
bottom middle top
25% 50% 25%

EXAMPLE 16

The ages of 10 people at a park were:

21 13 64 75 35 83 7 71 18 29
a Find a five-number summary for this data.
b Represent this data on a box plot.

Solution

a In order:
7 13 18 21 29 35 64 71 75 83

Q1 Q2 Q3

29 + 35
Lower extreme = 7      Lower quartile = 18    Median =  = 32
2
Upper quartile = 71      Upper extreme = 83
The five-number summary for the ages is 7, 18, 32, 71, 83.

b This box plot shows that, roughly:


• the bottom 25% of scores lie from 7 to 18
• the next 25% of scores lie from 18 to 32
0 10 20 30 40 50 60 70 80 90
• the median is 32
Age (years)
• the top 25% of scores lie from 71 to 83.

ISBN 9780170413565 10. Analysing data 439


EXAMPLE 17

This box plot represents the amount of pocket money in dollars earned by a
sample of 48 children.

5 10 15 20 25 30 35 40 45
Pocket money ($)
a Find the median.
b Find the range.
c How many children earned between:
   i  $33 and $42? ii  $15 and $42?
d Find the interquartile range.

Solution

a Median = $22

b Range = $42 − $7 = $35


1
c    i  × 48 children = 12 children Top 25%
4
3
ii  × 48 children = 36 children Top 75%
4
d Interquartile range = $33 — $15 = $18

Parallel box plots


Parallel box plots can be used to represent two or more sets of data. They are drawn on the
Double box
plots
same scale above each other.

EXAMPLE 18

The mean maximum monthly temperatures for Sydney and Melbourne are
shown in this table.

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Sydney 25.9 25.8 24.8 22.5 19.5 17.0 16.4 17.9 20.1 22.2 23.7 25.2
Melbourne 26.0 25.8 23.9 20.3 16.7 14.1 13.5 15.0 17.3 19.7 22.0 24.2
© Copyright Commonwealth of Australia 2017, Bureau of Meteorology

440 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


a Find the five-number summary for each city.
b Draw a parallel box plot to display the data.
c For each city, find:
i  the range ii  the interquartile range.
d Compare the temperatures for both cities. Are there significant differences between
the spread of the temperatures for Sydney and Melbourne?

Solution

a In order:
Sydney
16.4 17.0 17.9 19.5 20.1 22.2 22.5 23.7 24.8 25.2 25.8 25.9

Q1 Q2 Q3

17.9 + 19.5
Lower extreme = 16.4   Lower quartile =
2
= 18.7
Median = 22.2 + 22.5
2
= 22.35
24.8 + 25.2
Upper quartile =    Upper extreme = 25.9
2
= 25.0
Melbourne
13.5 14.1 15.0 16.7 17.3 19.7 20.3 22.0 23.9 24.2 25.8 26.0

Q1 Q2 Q3

15.0 + 16.7
Lower extreme = 13.5   Lower quartile =
2
= 15.85
19.7 + 20.3
Median =
2
= 20.0
23.9 + 20.3
Upper quartile =    Upper extreme = 26.0
2
= 24.05

ISBN 9780170413565 10. Analysing data 441


b Sydney

Melbourne

13 14 15 16 17 18 19 20 21 22 23 24 25 26
Temperature (°C)

c    i  Sydney: Range = 25.9 − 16.4 = 9.5


   Melbourne: Range = 26.0 − 13.5 = 12.5
ii  Sydney: Interquartile Range = 25.0 − 18.7 = 6.3
   Melbourne: Interquartile Range = 24.05− 15.85= 8.2
d The range of temperatures in Melbourne is 3º more than that of Sydney and the IQR
is 1.9º more so there is a significant difference. Sydney’s mean maximum monthly
temperatures are more consistent than Melbourne’s.

Example Exercise 10.06  Box plots


16
1 Tom’s scores for the 18 holes of a golf course were:
3 4 6 8 7 9 5  9 11
5 7 4 5 8 6 9 10  5
a Find a five-number summary for this data.
b Represent this data on a box plot.

2 Fifteen job applicants took a short general knowledge multiple-choice quiz. Their times,
in seconds, to complete this test were as shown below. Show this data on a box plot.
45 37 46 34 26 15 35 61
43 48 52 38 30 44 37

3 Find a five-number summary for the data in this stem-and-leaf


Stem Leaf
plot of ages of people at the cinema, then draw a box plot for
1 4 7 7 8
them.
2 6 8 9 9
3 1 3 5 5 7 8
4 0 2 2 4 5
5 3 7 8
6 2 9

442 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


4 This box plot illustrates the number of cigarettes smoked per day Example

by a sample of 60 smokers who are trying to quit. 40 17

Cigarettes smoked per day


a What is the median number of cigarettes smoked per day? 35

b What is the interquartile range? 30


25
c What is the lower extreme?
20
d How many people smoked between 20 and 25 cigarettes per
15
day?
10
e How many people smoked fewer than 20 cigarettes per day?
5
0

5 This dot plot shows the number of vehicles driving past Westvale High School
per minute in a 20-minute period.

2 3 4 5 6 7 8 9 10
Number of vehicles per minute

a Find the five-number summary for this data and draw a box plot.
b Compare the box plot you drew in part a with the original dot plot. Which one do
you prefer? Why?

6 This box plot represents the annual wages (× $1000) of the administration staff at a
TAFE college.
Annual wages of TAFE administrative staff

10 20 30 40 50 60 70 80
Wages (× $1000)

a One of the wages is an outlier which was not included in the box plot.
What is the outlier?
b What is the median wage?
c Excluding the outlier, what is the range of wages?
d Including the outlier, what is the range of wages?
e Between what two amounts are the middle 50% of staff wages?
f What percentage of the staff earn less than $28 000?

ISBN 9780170413565 10. Analysing data 443


Example 7 In Year 11, the results of the first assessment task of 40 students who do both Modern
18 History and Geography, are displayed on the parallel box plot below.

Geography

Modern History

35 40 45 50 55 60 65 70 75 80 85 90 95
Marks

a Find the five-number summary for each subject.


b For each subject, find:
i the range ii the interquartile range.
c What is the median for each subject?
d Which subject has the least spread? Give reasons.
e How many students scored between 60 and 75 in:
i Geography? ii Modern History?
f In which subject did the Year 11 students perform better? Give reasons.

8 Year 12 students at Baramvale High had their pulse taken. The results are as follows.

Male 106 70 69 58 60 68 64 63 75 70 84 88 59 60 66
Female 68 74 59 75 74 82 82 71 120 55 77 91 73 60 79

a Find the five-number summary for each group and draw a parallel boxplot to show
the information.
b Find the range and interquartile range for each group.
c Compare the spread between the two groups. Are there significant differences
between the pulse rates for males and females?
d Which group had the lower pulse rates. Give reasons.

9 The box plot shows the results of tests in Physics and Chemistry.
Physics

Chemistry

30 40 50 60 70 80 90
Marks

In Chemistry, 48 students completed the yearly exam and the number of students who
scored above 50 or more was the same for both subjects.

How many students completed the Physics exam? Select A, B, C or D.


A 24 B 12 C 54 D 72

444 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


10 Fifteen people at a health centre had their Dominant hand Non-dominant hand
reaction times (in seconds) tested first using
0.41 0.48
their dominant hand and then their non-
0.31 0.34
dominant hand. The results are shown in the
table on the right. 0.38 0.38

a Find the five-number summary for both 0.50 0.45


sets of results and draw a parallel box 0.38 0.38
plot to display the data. 0.33 0.35
b Find the range and interquartile range 0.36 0.30
for the dominant hand and the non- 0.46 0.45
dominant hand. 0.29 0.9
c Are there significant difference between 0.44 0.41
the two sets of results.
0.52 0.50
0.43 0.41
0.37 0.40
0.31 0.34
0.32 0.35

10.07  Standard deviation WS

Standard deviation is a better measure of spread than the range and interquartile range
Statistical
Homework
because, like the mean, its value depends on every score in the data set. Standard deviation calculations

measures how different each score in a data set is from the mean.
The formula for calculating standard deviation is quite complicated, and does not need to be
learnt. Instead, you can use your calculator’s statistics mode.

EXAMPLE 19

Calculate, correct to one decimal place, the standard deviation of each data set below.
a The maximum daily temperature (in °C) in Mudgee for the first two weeks in
January:

30 28 26 31 34 35 32
33 21 25 28 32 32 35

b The body temperatures (in °C) of a group of hospital patients:

36 37 38 39 40 41 42 °C
Patients’ temperatures

ISBN 9780170413565 10. Analysing data 445


Solution

a σ = 3.9434… ≈ 3.9
Operation Casio Scientific Sharp Scientific
Refer to page 404 to enter the data.
Calculate the population standard deviation SHIFT 1 Var sx = RCL σx
(σx = 3.9434…)

b σ = 1.5362…
≈ 1.5

To calculate the standard deviation of data presented in a frequency table, refer to the table of
calculator instructions on page 407, then follow the instructions from part a above.

EXAMPLE 20

Thirty-six people were given a concentration task and the time taken (in seconds) to
complete the exercise are shown below.

Males 32 44 44 29 40 26 64 21 65 32 42 30 66 51 53 30 55 42
Females 35 35 41 41 49 38 33 44 36 53 28 42 37 35 28 54 60 61

a Find the mean and standard deviation of each group.


b Is there a significant difference between the times it took to complete the exercise for
males and females? Give reasons.

Solution

a Using the calculator’s statistics mode:


Males: x = 42.56, σ = 13.58

Females: x = 41.67, σ = 9.77


b The mean time to complete the task for females was only 0.89 seconds lower than for
males. However the standard deviation for females was 3.88 seconds lower than the times
for males, showing that the times for females were more consistent than for males.

446 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


Samples and populations
Governments and businesses do not make important decisions based on just one sample.
Researchers generally take a number of samples from a population and calculate the
statistics of each sample. The sample means and standard deviations are then used to estimate
the population mean and standard deviation respectively.
The sample mean, x , and the sample standard deviation, s, or sx, are called statistics.
The population mean, µ (the Greek letter ‘mu’), and the population standard deviation, σ
or σx (the Greek letter ‘sigma’) are called parameters.
The sample statistics are estimates of the population parameters.
When calculating the standard deviation of a set of data, we will usually use the population
standard deviation, σ. If the set of data is a sample, however, then use the sample standard
deviation, s, to estimate the results for a population.

EXAMPLE 21

The ages of 60 people working at Burger Haven this year are:

18 19 18 17 20 20 24 15 24 19
15 35 15 24 22 19 15 17 23 29
15 40 21 17 20 22 23 21 24 23
22 16 36 15 16 24 16 15 19 15
34 19 45 20 15 21 24 27 19 33
18 27 15 30 15 34 17 29 25 17

a Find, correct to one decimal place, the population mean (µ) and population standard
deviation (σ) of the Burger Haven employees.
b Randomly select three samples of ten ages from this population of employees and for
each sample, calculate (correct to one decimal place) the mean (x ) and the standard
deviation (s).
c Estimate the mean and standard deviation of the population from the statistics of the
three samples.
d How do the estimates of population mean and standard deviation compare with the
answers in part a?

Solution

a µ = 21.8666… ≈ 21.9 years


σ = 6.7908… ≈ 6.8 years

ISBN 9780170413565 10. Analysing data 447


b Randomly select three samples of ten ages from the list above. For example:
Sample 1: 21 16 15 15 16 30 19 30 35 24
Sample 2: 19 23 21 16 21 15 20 36 40 15
Sample 3: 18 15 25 27 20 15 24 16 17 21
The mean and standard deviation for the samples are:
Sample 1:  x = 22.1   s = 7.309… ≈ 7.3
Sample 2:  x = 22.6   s = 8.604… ≈ 8.6
Sample 3:  x = 19.8   s = 4.341… ≈ 4.3

22.1 + 22.6 + 19.8


c Estimate of the population mean = = 21.5
3
7.3 + 8.6 + 4.3
Estimate of the population standard deviation = = 6.7
3

d The estimates to the population mean and standard deviation (21.5 and 6.7) compare
favourably with the population mean and standard deviation (21.9 and 6.8).

Exercise 10.07  Standard deviation

Example
1 The number of monthly accidents at a construction site over 8 months was:
19 3 0 4 2 3 0 2 2
a Calculate the mean number of accidents per month.
b Find the standard deviation for the data, correct to one decimal place.

2 An express train from Central Station was late in arriving at Homebush by the following
times (in minutes):
6 0 3 −2 5 −1 0 3 −1 6 7 1

a Find the mean, x .


b Calculate the standard deviation, σ, correct to two decimal places.
c Evaluate x + σ and x − σ , the values that are, respectively, one standard deviation
below and one standard deviation above the mean.
d How many of the given scores lie within one standard deviation of the mean, that is,
between the two values you calculated in part c?
e What percentage, correct to one decimal place, of scores were within one standard
deviation from the mean?

448 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


3 A sample of mobile phone batteries was tested for charge life (in hours).
60 73 65 84 77 64 66 73 88 90 79 81
Find, correct to two decimal places:
a the mean
b the sample standard deviation, s.

4 Blake’s weekly commissions, in dollars, for selling Internet plans were:


540 510 1100 1350 780 650 920 590 1080
Calculate for this data, correct to the nearest dollar:
a the mean b the standard deviation.

5 Students were surveyed on the number of movies they had Score (x) Frequency (f )
downloaded in the last six months, with the results shown
0  6
in the frequency table.
1  7
a For this data, find the mean, x .
2  8
b Calculate, correct to one decimal place, the standard
3 10
deviation.
4  9
c How many scores were within one standard deviation
of the mean? 5  5

d What percentage of scores were within one standard 6  5


deviation of the mean?
For many large sets of data, approximately
68% (slightly more than 2  ) of the scores lie
3
within one standard deviation of the mean.

6 This dot plot shows the number of vehicles driving


past Westvale High School every minute for a
20-minute period.
a Find the mean.
2 3 4 5 6 7 8 9 10
b Calculate, correct to two decimal places, the
Number of vehicles per minute
standard deviation.
c How many scores were within one standard deviation of the mean?
d What percentage of scores were within one standard deviation of the mean?

7 This table shows the weekly wages of Weekly wage ($) Class centre Frequency
employees at Great Gals electrical store, $500 – < $600  7
grouped in classes of $100.
$600 – < $700 20
a Copy and complete the table.
$700 – < $800 36
b Find, to the nearest cent, an estimate
$800 – < $900 17
for:
$900 – < $1000 11
  i the mean
ii the standard deviation. $1000 – < $1100  3

ISBN 9780170413565 10. Analysing data 449


Example 8 The heights (in cm) of males and female students in a Year 11 PDHPE class are shown.
20
Males 183 160 178 179 171 175 184 172 173 187 179 165
Females 172 160 162 160 173 165 165 163 168 150 160 177

a Find the mean height and sample standard deviation for males and for females.
b Is there a significant difference between the heights of males and females? Give reasons.

9 The results of the first two Maths tests given to Test 1 Test 2
a Year 11 class are displayed in the back-to-back
4 3 2
stem-and-leaf plot.
4 3 4 9
a Find the mean mark and standard
9 8 0 5 2 7 9
deviation for each test.
9 8 7 4 0 6
b Are there significant differences between
the means and standard deviations of the 9 7 5 5 5 3 1 7 0 1 1 2 4 4 8
two tests? 9 9 8 0 1 2 4 5 5 7 8
c In which test did the students perform
better? Justify your answer.

10 A group of men and women were timed on the length of time (in seconds) of the last call
they made on their mobile phone.

Men 292 360 840 60 60 900 60 328 217 16


1565 58 22 98 73 537 51 49 1210 15
Women 653 73 202 58 74 75 58 168 354 600
1560 2220 56 900 481 60 139 80 72 110

a Find the mean and standard deviation for each group.


b Calculate the mean and standard deviation of the times for men and women if the
outliers (1565 s and 1210 s for men, 1560 s and 2220 s for women) are excluded.
c Do men or women make longer calls? Justify your answer.

Example 11 a As in Example 21, randomly select three samples of ten ages from the population
21 of Burger Haven employees and, for each sample, calculate the mean (x ) and the
sample standard deviation (s).
b Estimate the mean and standard deviation of the population from the statistics of
the three samples.
c How do the estimates of population mean and standard deviation compare with the
answers in part a?

12 a Randomly select three samples of five ages from the Burger Haven employees and,
for each sample, calculate the mean (x) and the sample standard deviation (s).
b Estimate the mean and standard deviation of the population from the statistics of
the three samples.
c How do the estimates of population mean and standard deviation compare with the
answers in part a?

450 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


13 Using your results from Questions 11 and 12, do the sample statistics become more
accurate and closer to the values of the population mean and standard deviation with a
larger sample size?

TECHNOLOGY
Calculating measures of spread
Step 1: Open a blank spreadsheet and enter the temperature data about Mudgee from
Example 19 on page 445.

Step 2: In cell E5, enter the formula =MAX(A2:G3) to calculate the
highest score (35).
Step 3: In cell E6, enter =MIN(A2:G3) to calculate the lowest score (21).
Step 4: In cell E7,
enter =QUARTILE(A2:G3,3) Note: A spreadsheet calculates
to calculate the upper quartiles using a slightly different
method to the method we have
quartile, Q3 (32.75).
described, so its answers for the
Step 5: In cell E8, enter interquartile range may not be exactly
the same as ours, but they should be
=QUARTILE(A2:G3,1) to calculate close.
the lower quartile, Q1 (28).
Step 6: In cell E10, enter =E5-E6 to calculate the range (14).
Step 7: In cell E11, enter =E7-E8 to calculate the interquartile range (4.75).
Step 8: In cell E12, enter =STDEV.P(A2:G3) to calculate the population standard
deviation.

ISBN 9780170413565 10. Analysing data 451


10.08  The shape of a distribution
The shape of
The shape of a statistical distribution (data set) shows how the Symmetrical
a frequency data is spread, and can be seen by drawing a curve around its

Frequency
distribution
graph or display.

WS A distribution is symmetrical if the data are balanced or


evenly spread about the centre of the distribution, with the Mean Score
Shapes of
Homework Median
distributions mean, median and mode being equal. One example of a Mode
symmetrical distribution are students’ marks in an HSC
examination.
A distribution is positively skewed if its tail points to the right Positively skewed
(the positive direction), because the mean is above the mode

Frequency
and median.

The word ‘skewed’ means twisted.


Mode Mean Score
One example of a positively skewed distribution are house Median
prices in a large country town.
If a distribution is negatively skewed, then its tail points to the Negatively skewed
left (the negative direction) because the mean is below the mode

Frequency
and median.
One example of a negatively skewed distribution is the heights of
the players in a basketball team. Mean Mode Score
Median
Peaks are the high points of the distribution and represent the
more frequent scores. The highest peak is the mode.
The modality is the number of peaks occurring in a distribution. A distribution can have one
peak only (unimodal) or have more than one peak (multimodal).
Frequency

Frequency

Score Score
Unimodal distribution Multimodal distribution

If a distribution is bimodal, it has two peaks. For example, this frequency histogram is
bimodal, having two peaks at 2 and 7. The mode, however, is 7.

452 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


Frequency
1 2 3 4 5 6 7 8 9 10 11
Score

Clusters are groups of scores that are bunched or close together.

EXAMPLE 22

For each distribution shown below:


i  describe its shape ii  state the modality iii  identify any clusters.

a  Marks in a Japanese test b  Amount of traffic on Sydney’s roads


Stem Leaf
3 1 2
4 3 5 9
5 0 2 6
6 4 5
Time
7 3 5 6 7 7 8 9
8 0 2 4 6 6 8 8 9
9 1 2 4 8 9

ISBN 9780170413565 10. Analysing data 453


c Ages of children at a cinema d 
Ages of people in a small coastal town
70
60
50

Frequency
40
30
2 3 4 5 6 7 8 9 10 20
Age 10
0

0–4
5–9
10–14
15–19
20–24
25–29
30–34
35–39
40–44
45–49
50–54
55–59
60–64
65–69
70–74
75–79
80+
Age

e Waiting time in a doctor’s surgery

15 20 25 30 35 40
Waiting time (min)

Solution

a   i  negatively skewed (tail points towards the left)


ii  multimodal, peaks at 77, 86, 88
iii  clusters in the 70-90s
b   i  positively skewed (tail points towards the right)
ii  bimodal, 2 peaks
iii  clusters at earlier hours

c   i  symmetrical
ii  multimodal, peaks at 3,5,7 and 9
iii  no clusters
d   i  positively skewed (tail points towards the higher ages)
ii  unimodal class, 1 peak
iii  cluster from 15 to 29
e   i  positively skewed (tail points towards the right)
ii  Unable to determine since individual scores are not known.
iii  cluster from 15-17 min (25% of patients)

454 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


Comparing sets of data
WS
Distributions or numerical data sets can be described and compared in terms of modality,
shape, measures of central tendency and spread and outliers. Comparing
Homework
city
temperatures

EXAMPLE 23 WS

Comparing
Homework
The daily maximum temperatures for Sydney and Brisbane for December are word lengths

shown below.
WS
Sydney
Comparing
Homework
sports scores

18 20 22 24 26 28 30 32 34 36 38 40
Temperature (°C)

Brisbane

18 20 22 24 26 28 30 32 34 36 38 40
Temperature (°C)

a Find the mean, the median and modal temperatures for each city.
b Find the range, interquartile range and standard deviation for each city.
c Describe the shape of the distribution of temperatures for each city and identify any
outliers and clusters.
d Compare the temperatures in Sydney and Brisbane. Comment on measures of central
tendency and measures of spread.

Solution

a Sydney: Mean = 28.2ºC Brisbane: Mean = 29.9ºC


Median = 27ºC Median = 30ºC
Mode = 27ºC Mode = 30ºC

ISBN 9780170413565 10. Analysing data 455


b Sydney: Range = 38º − 19º = 19º Brisbane: Range = 34º − 24º = 10º
IQR = Q3 − Q1 IQR = Q3 − Q1
= 30 − 25 = 31 − 28
=5 =3
Standard deviation = 4.6       Standard deviation = 2.1

c Sydney’s distribution of temperatures is positively skewed and 38ºC is just an outlier:


(Q3 + 1.5 × IQR = 30 + 1.5 × 5 = 37.5).
Brisbane’s temperatures have a slight positive skew and has no outliers.
Sydney’s temperature are bimodal, with peaks at 24ºC and at 27ºC, and are clustered
at 27−30ºC. Brisbane’s temperatures are also bimodal, with peaks at 28ºC and at 30ºC
and are clustered at 30ºC.

d Brisbane is the warmer city as shown by the mean, median and mode which are 2–3º
above those of Sydney.
The spread of Sydney’s temperatures is significantly greater than Brisbane’s as shown
by larger values of the range, interquartile range and standard deviation. Sydney also
had the lowest and the highest temperatures in December.

Example Exercise 10.08  The shape of a distribution


22
1 This dot plot shows the judges’ scores in a diving competition. Which of the following
statements is true about the distribution? Select A, B, C or D.

0 1 2 3 4 5 6 7 8 9 10

A The data is positively skewed with a cluster around 6 to 8.


B The data is symmetrical with no modes.
C The data is negatively skewed with one mode.
D The data is positively skewed with a cluster around 0 to 4.

456 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


2 For each distribution:
i describe its shape ii state the modality iii identify any clusters
a 12
b
10
8 4 5 6 7 8 9
Frequency

6
4
2
0
1 2 3 4 5 6 7 8 9 10 11
Score

c d
Stem Leaf
1 3 4 6 6 6 7 8 9 9
2 0 7
3 1 2 2 5 7 8 8 9
4 0 2 3 10 11 12 13 14 15 16 17 18 19 20
5 2 9

e f 9
Stem Leaf 8
7
Frequency

4 1 3 6
5
4
5 5 5 6 3
2
6 0 3 5 5 6 8 1
0
5 10 15 20 25 30 35 40 45 50
7 2 6
Score
8 5 5 8

3 This stem-and-leaf plot shows the number of Stem Leaf


mobile phones sold in January across various 2 2 6
OzTel stores in Australia.
3 0 1
a How many OzTel stores were surveyed?
4 4 8
b Describe the shape of the data.
5 2 6 9
c Where does the clustering occur?
6 1 3 4 5
d What is the mode?
7 0 2 3 4 4 5 5 7 7 7 8 9
8 3 5 7 7 8 8 8 8 9
9 2 8

ISBN 9780170413565 10. Analysing data 457


4 The number of visits to the MyFace website was recorded between 1200 (noon) and
2100 (9 p.m.) one day.

Hour 1201– 1301– 1401– 1501– 1601– 1701– 1801– 1901– 2001–
1300 1400 1500 1600 1700 1800 1900 2000 2100
Hits 1300 800 400 2100 2500 4500 3900 5300 2300

a Draw a histogram to represent this data.


b Comment on the shape of your histogram, also referring to modality and clusters.
c Suggest a possible reason for the skewness of this data.

5 Which statement is true about the data sets below? Select A, B, C or D.

X Y
3 4 5 6 7     3 4 5 6 7

A Y is positively skewed.
B X does not have a mode.
C The mean of Y is 5.
D X and Y are both symmetrical.

6 These are the ages of employees at the Berry Good Biscuit factory.
16  36  15  16  15  19  55  59  18  20  50  22  21  35  22  19  15  17  43  49
a Draw a stem-and-leaf plot for this data.
b Comment on the shape of the distribution, mentioning skewness, peaks and
clusters.

7 This dot plot represents the number of accidents per month at a factory over a year.

0 1 2 3 4 5 6 7 8 9
Accidents/month

a Comment on the shape of the dot plot.


b What is the mode?
c Calculate the mean (correct to one decimal place) and compare it to the mode.

458 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


8 This back-to-back stem-and-leaf plot compares 11BS1 11BS2 Example
the half-yearly exam of two Year 11 Business
8 3 4 23
Studies classes.
9 7 7 4 5 8
a Find the mean, median and mode for each
9 7 7 6 6 3 5 2 7 9
class.
9 8 7 4 6 3 4 4 6 7
b Find the range, interquartile range and
standard deviation of the marks for each 7 3 2 7 1 1 2 3 4 6 8
class. 5 3 8 2 4
c Describe the shape of the distribution of
marks for each class.
d Compare the marks for both classes and determine which class achieved better
results, commenting on shape, measures of central tendency and measures of
spread.

9 The results of a Year 12 Maths exam are shown on the parallel box plot
below.

12W

12X

20 30 40 50 60 70 80
Test results

a What is the median result for each class?


b Find the range and interquartile range for each class.
c Describe the shape of the results for each class.
d Which class had the better test results? Give reasons.

10 A Year 11 Biology class was asked to estimate their test results before completing the
test. The estimates and actual test results are shown below.

Estimates 87 80 83 65 82 82 92 73 82 89
93 77 70 65 85 33 87 77 78 75
88 89 86 58
Test results 80 73 86 52 91 91 72 64 91 87
79 46 78 85 82 32 87 73 79 86
95 79 49 73

a Display the data in a back-to-back stem-and-leaf plot.


b Comment on the shape of each set of data, mentioning skewness, modality and
clusters.

ISBN 9780170413565 10. Analysing data 459


c For each group of results, find:
i the mean ii the median iii the mode.
d For each data set, find:
 i the range ii the interquartile range
iii the standard deviation.
e Compare the two sets of results. Did the students overestimate their results?
Justify your answer.

SAMPLE HSC PROBLEM

The ages, in years, of a sample of patients at a hospital are Stem Leaf


shown in the stem-and-leaf plot.
1 2 2 3 4 6
a Find the mean age of the patients. 2 1 2
b Find the median age of the patients. 3 0 0 0 3
4 4 7 8
c Is the mean or median more appropriate for describing
the average age of the patients? Give a reason 5 1 1
for your answer. 7 5 7 8

d Find the interquartile range of the patients’ ages. 8 1

e Represent this data set on a box plot.

Study tip
Looking after yourself
• While studying, don’t forget to keep it all in perspective.
• Remember to have your own life outside school.
• Look after your physical and mental health.
• Eat properly and have enough sleep.
• Exercise regularly, play sport and go out.
• Plan to do nothing occasionally.
• Relax and rest regularly.
• Talk to your family, visit your friends.
• Be positive and sensible.
• Have confidence in yourself and don’t stress.
• Don’t worry, be happy.

460 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


10. CHAPTER SUMMARY

This chapter, Analysing data, examined the statistical measures of central tendency (mean,
median, mode) and spread (range, interquartile range, standard deviation). You should WS

be competent at making statistical calculations on sets of numerical data, including those Statistics
Homework
represented in frequency tables, class intervals (grouped data), dot plots and stem-and- review

leaf plots. Make sure you know how to use the statistical functions of your calculator. You
should understand the new concepts of quantiles (quartiles, deciles and percentiles), be PS

able to interpret cumulative frequency graphs and construct box plots using a five-number Statistics
summary. You must also be able to describe, compare and interpret data sets in terms of crossword

modality, shape (symmetrical and skewness), measures of central tendency and spread and
also look at the effect of outliers.
Make a summary of this topic. Use the outline at the start of this chapter as a guide. An
incomplete mind map is shown below. Use your own words, symbols, diagrams, boxes and
reminders. Gain a ‘whole picture’ view of the topic and identify any weak areas.

Quantiles:
deciles,
quartiles
and
percentiles Measures of
Measures of
central spread and
tendency outliers

ANALYSING
DATA
Shape of
data sets
Box plots

Cumulative Comparing
frequency data sets
graphs

ISBN 9780170413565 10. Analysing data 461


10. TEST YOURSELF

Exercise
1 The heights (in centimetres) of a group of ballet dancers are:
10.01 165  183  170  168  175  179  168  170
181  168  172  177  171  170  175  179
a Calculate the mean, correct to one decimal place.
b Find the median height.
c What is the mode?

Exercise
2 Motor vehicles were clocked, by police radar, travelling at the following
10.01 speeds (in km/h):
78  95  64  77  81  84  77  89  90  78
79  80  82  84  80  79  95  86  84  70
78  65  82  91  89  60  85  81  78  68
90  84  69  70  80  91  85  84  80  76
68  65  85  76  79  83  82  91  84  80
a Sort the data in a frequency table using classes of 60–< 70, 70–< 80, and so on, and
include a column of class centres.
b Calculate an estimate for the mean speed.
c Find the median class of speeds.
d What is the modal class?

3 The dot plot represents the sum of two dice


Exercise

10.01 rolled 20 times.

Find the mean, median and mode of this


data.
2 3 4 5 6 7 8 9 10 11 12
Sum of two dice
Exercise
4 The house prices realised at auction one
10.01 Saturday in Vincentia were:
$642 000 $585 000 $352 000 $1 480 000
$705 000 $415 000 $680 000 $740 000
a Calculate the mean price. b Calculate the median price.
c Is the mean or the median the better measure to use as the average price of the
houses? Why?

Exercise 5 Which measure of central tendency is most appropriate for describing each average
10.01 below? Give a reason for each answer.
a The average men’s shoe size
b The average height of Year 11 students
c The average starting salary of an Australian worker

462 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


6 A grouped data frequency table is shown.
Class interval Frequency
What is the mean? Select A, B, C or D. 11–15  4
A 24.1 B 25.3 16–20  7
C 26.1 D 28.1 21–25 12
26–30 24
31–35 15

7 In a national mathematics test, Simone scored 84. Exercise

10.02
a This score was above the 7th decile, D7. Approximately what percentage of students
taking the test scored lower than her?
b More specifically, Simone’s score was at the 78th percentile, P78. What percentage
of students scored higher than her?

8 a What is the meaning of ‘interquartile range’? Exercise

b A random sample of 15 packets of corn chips had the following masses in grams. 10.03
Find the range and interquartile range of these masses.
52 51 50 49 50 50 48 51
51 50 49 53 50 49 51

9 This stem-and-leaf plot on the right represents the Stem Leaf Exercise

number of points per match scored by the Sharks in a 0 6 6 10.03


football season. For this data, find:
1 2 3 4 4 4 8 8 9
a the range 2 0 0 0 5 6
b the interquartile range. 3 0 0 2 4 4 6 7
4 0
5
6 2

10 In a small business, eight employees earn the following wages per week. Exercise

10.04
$1026  $874  $950  $950  $980  $1140  $1216  $1710
Is the wage of $1710 an outlier for this set of data? Justify your answer with calculation.

11 Consider the set of scores: Exercise

4  7  8  8  12  15  19  20 10.04

a What is the effect on the mean and median if an outlier of 40 is added to


this data set.
b Is the mean or median a better measure of central tendency when there is an outlier
in the data set?

ISBN 9780170413565 10. Analysing data 463


Exercise 12 Students were surveyed about the number of pairs of
10.05 shoes they owned, and the results are shown in the Pairs of shoes Frequency
table on the right.
5  8
a Copy the table, adding a cumulative frequency
6 11
column. Then draw a cumulative frequency
histogram and polygon. 7 10
8  6
b Use your polygon to calculate:
   i the median 9  5
   ii the interquartile range
iii the 3rd decile.

Exercise 13 The cumulative frequency graph Marks in a test


10.05 shows the results of an assignment
36
marked out of 10.
a How many students completed 32

the assignment? 28
b Use the graph to estimate: Cumulative frequency
24
  i the median
ii the interquartile range 20
iii the 6th decile 16
iv the 45th percentile.
12

2 3 4 5 6 7 8 9 10
Mark

Exercise 14 This box plot represents the number of


10.06 goals scored per game by a hockey team
over a season. 0 1 2 3 4 5 6 7 8 9 10 11 12
a What was the lowest score? Goals per game

b Find the interquartile range.


c In what fraction of games were more than 8 goals scored?
d In what percentage of games were fewer than 5 goals scored?

Exercise
15 a Create a five-number summary for the corn chip packet masses in Question 8b.
10.06 b Represent the mass data on a box plot.

464 NCM 11.  Mathematics Standard (Pathway 2) ISBN 9780170413565


16 The parallel box plots show the distribution of marks for exams in English and History.

English

History

10 20 30 40 50 60 70 80 90 100
Marks

a Which subject has the smaller spread of marks? Give reasons.


b The number of students who scored 70 or less is the same for both subjects.
If 144 students did the English exam, how many students did the History exam?

17 For quality testing, a manufacturer takes a random sample of 10 screws, each designed to Exercise
have a length of 2 cm. The actual lengths of the screws, in centimetres, are: 10.07
2.00  1.99  1.98  2.01  2.01  1.97  2.03  1.98  2.01  2.00
a Find the mean screw length.
b Find the standard deviation, correct to two decimal places.

18 For the shoe data from Question 12, calculate (correct to one decimal place): Exercise

10.07
a the mean b the standard deviation.

19 The results for the multiple-choice section in two tests taken by a Year 11 Mathematics Exercise

class are shown below. 10.08

Test 1 10 Test 2
9
8
7
6
5
4
3
2
1

10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10
Frequency

a Find the mean, median and mode for each test.


b Describe the shape of the data set for each test.
c For each test, find:
i  the range ii  the interquartile range iii  the standard deviation.
Qz
d Are there any significant differences in the results of the two tests? Justify your
answer by referring to the measures of central tendency and spread of the tests. Chapter quiz

ISBN 9780170413565 10. Analysing data 465

You might also like