0% found this document useful (0 votes)
34 views45 pages

CH 14 Statistics

Uploaded by

nicklaslizipai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views45 pages

CH 14 Statistics

Uploaded by

nicklaslizipai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Rich & Unhappy ...

or Poor & 5atiS'fied - OPYNJON Pot L


Which do yov prefer?

Contents
14:01 Review of statistics 14:04 Box plots
Investigation 14:01 Adding and averaging Investigation 14:04 Code breaking and statistics
14:02 Cumulative frequency 14:05 Comparing sets of data
14:03 Measures of spread: Interquartile range Investigation 14:05 The ageing population
Fun spot 14:03 Why did the robber flee from the Maths terms, Diagnostic test, Assignments
music store?
-----------------------------------------------------------------------------------------------
Syllabus references (See pages x-xv for details.)
Statistics and Probability
Selections from Single Variable DataAnal1,sis [Stages 5.1, 5.2°]
• Construct back-to-back stem-and-leaf plots and histograms and describe data, using terms including 'skewed',
'symmetric' and 'bi-modal' (ACMSP282)
• Compare data displays using m ean, median and range to describe and interpret numerical data sets in terms of
location (centre) and spread (ACMSP283)
• Determine quartiles and interquartile range (ACMSP248)
• Construct and interpret box plots and use them to co1npare data sets (ACMSP249)
• Compare shapes of box plots to corresponding histograms and dot plots (ACMSP250)

-----------------------------------------------------------------------------------------------
Working Mathematically
• Communicating • Problem Solving • Reasoning • Understanding • Fluency
• • •

In Years 7 and 8 the work in statistics concentrated on the classification, collection, organisation
and analysis of data.

The topics covered included: I know tJ.,at! ... I t'1ink ...


• types of data: categorical and numerical (discrete or continuous)
• frequency tables and their graphs
• analysing data: range, mode, mean, median
• stem-and-leaf plots and dot plots
• sources of data: primary and secondary sources
• sa1npling a population: random, systematic and stratified sampling
• grouped data.
Here is a renunder of the statistical measures used and their definitions.

The range = highest score - lowest score.


The mode is the outcome that occurs the most.
The median is the middle score for an odd number of scores. The Greek letter L (sigma) is
The median is the average of the middle two scores for an used for 'the sum of'.
even number of scores. The symbol x [x bar) is used
The mean is the arithmetic average. for 'the mean'.
sum of the scores So, a compact definition of the
Mean= - - - - - - - - - LX
total number of scores . -
mean1s x= .
'Lf
sum offx colu1nn
--
sum off column

WORKED EXAMPLE 1
Find the range, mode, median and mean of each set of scores.
a 4 4 4 12 9 6 10 b 15 36 40 23 18 46 21 28 32 36

Solutions
a Range = highest score - lowest score b Range = highest score - lowest score
= 12- 4 = 46 - 15
=8 = 31
Mode = score occurring most Mode = score occurring most
=4 = 36
Median = middle score Median = average of two middle scores
=6 28+32
-
sum of the scores 2
Mean=--------- = 30
total number of scores
295
4+4+4+12+9+6+10 Mean=--
7 10
=7 = 29·5

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


WORKED EXAMPLE 2
The following marks out of ten were obtained in a class quiz.
5 3 8 6 7 5 7 3 4 5 7 8 5 5 4 6 6 6 3 6 6 3 6 4 5 3 7 5 6
• Organise these scores into a frequency distribution table.
• Use the table to calculate the mode, median and mean mark for the quiz.

Solution
• The mode is the score that occurs the
Mark Tally Frequency fx
most. Here, the mode is 6.
(x) lf)
• The median is the middle score when
3 - 5 15
- the scores are arranged in order. As there
4 I 3 12 are 29 scores, the 15th score will be the
middle score. Counting down the
5 - - I 7 35
frequency column, it can be seen that the
6 - - II 8 48 15th score is a 5. Hence, the median is 5.
su1n ofJx column 'Lfx
7 I I 4 28 • T h e mean=
sum ofJ column
=-
'Lf
8 2 16 154
Total: 29 154 29
= 5·3 (1 dee. pl.)

WORKED EXAMPLE 3
Class quiz
Draw a frequency histogram and polygon for the data in
Worked Example 2.

Solution
• The frequency histogram is a column graph. >. 6
()

• The graph has a title and the axes are labelled. 5


::::J
0-
• The first column begins one-half of a column width in u..
4

from the vertical axis. 3


• The frequency polygon is a line graph. 2
• The first non-zero dot is one unit in from the vertical axis. 1
• The dots showing the data are joined by straight lines and
3 4 5 6 7 8
joined to the horizontal axis as shown. Marks

• • T"1if dot plot lookf
• • f i mi Ia r to a /,, i fto 9ram !
• • •
• • • •
• • • • •
• • • • • •
• • • • • •
3 4 5 6 7 8
Marks

1' Statistics
WORKED EXAMPLE 4
The scores for a group of 45 students on a spelling test out of 60 were:
50 41 34 25 18 8 35 45 54 14 59 28 39 42 53
34 51 38 47 21 50 9 54 57 46 10 48 34 11 40
52 8 23 42 52 46 37 27 55 17 32 41 30 25 11
a Sort this set of data into a grouped frequency distribution table with groupings of 0-9,
10-19 etc.
b Find the modal class, median class and an estimate for the mean using the class centres for
each group.
c Construct a stem-and-leaf plot and use this to find the mode, median and mean. Compare
these measures with those for the grouped data.

Solutions
a Class Class centre (c.c.) Tally Frequency lf) f x c.c.
0-9 4.5 III 3 13·5
10-19 14·5 - 6 87
-
20-29 24·5 -
- 6 147
30-39 34·5 - - II I 9 310·5
40-49 44·5 - -- - 10 445
50-59 54·5 - -
- - I 11 599·5
Total: "Lf = 45 "L(fx c.c.) = 1602·5

b Modal class= 50-59 (highest frequency of 11)


Median class= 30-39 (count down the frequency column to the middle score; 23rd out of 45)
"Lf x c.c. 1602·5
Mean= = = 35·6 (1 dee. pl.)
"Lf 45
c The stem-and-leaf plot looks like this:
I
Stem Leaf

0 889 TJiis- is- an 'ordered'


1 011478 s-tem-and-leaf plot
2 135578 as- tJie data Jias- been
3 02 444 57 8 9 arran9ed in order.
4 0112256678
5 00122344579

The mode is 34 as it is the score that occurs the most.


The median is 38, the 23rd score (middle score of 45 scores).
LX 1593
The mean = = = 35·4
"Lf 45

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


Comparing these measures with those of the grouped table shows the mode and modal class
to be very different. However, the median lies within the median class and the mean is only
slightly different.
If this data is compared with the group's results for a previous test using a back-to-back
stem-and-leaf plot, simple comparisons can be ni.ade.

Previous test Latest test


The mode for the
Leaf Stem Leaf
previous test was 24.
97654 0 889 The median was 22.
9865543321 1 011478 LX 1206
The mean== == == 26·8.
9865 444 2 100 2 135578 Lj 45
9986653321 3 024445789 There also appeared to be an
98753220 4 0112256678 outlier score of 58.
8 5 00122344579

In particular, the median and mean indicated significant overall improvement in the scores.
This can also be seen by comparing the distribution of the scores in the back-to-back plot.

Reminder: How to use a calculator to find the mean


• For sets of individual scores, si1nply add the scores in the calculator and then divide this
sum by the number of scores.
The following instructions are for a Casio fx-82AU PLUS. Consult your calculator's manual if
you have a different model. Most calculators have a statistics mode that enables the use of various
statistical functions. The 'mean' function, which is usually labelled as x, is used here.
To find the mean for grouped scores:
Outcome Frequency
• Firstly press (SHI FTJ then (SET UPJ and toggle to the second screen
of functions. Select 3:STAT. Then select 1 :ON to 0 2
choose a frequency column.
1 3
• Press (MODE) then select 2:STAT to enter statistics mode.
2 4
• Select 1:1-VAR and two columns should appear on the screen.
3 7
• Enter the scores in the 'Outcome' column labelled 'x' using the
E key. 4 3
• Toggle across to the second column labelled 'FREQ' and up to 5 1
the top of the colun1n and enter the frequency for each score.
Now this step is IMPORTANT!
Note: If you don't enter any
Press the (All Clear) key to remove the table from the screen. numbers in the frequency
• Press (SHIFT) then STAT (located above the [IJ key). column, the calculator
assumes there is only one
• Select 4: VAR.You will be presented with four choices.
of each score.
• Select 2: x and then press the ( = J key. This gives you the
inean of 2·45.

1' Statistics
D Determine the: i range, ii mode, iii median and iv mean for each set of scores.
a 5,9,2,7,5,8,4 b 5,8,5,7,8,5,9,7
c 21,24, 19,25,24 d 1·3 1·5 1·1 1·5 1·6 1·4 1·7 1·9
' ' ' ' ' ' '
fl Use your calculator to evaluate x for each set of scores.
a 6,9,7,8,5 b 61,47,56,87,91 c 8, 8, 8, 8, 8, 8
4,9,6,5,4 44,59,65,77,73 6,6,6,6,6,6
3,8,8,5,7 49,39,82,60,51 7,7, 7,7,7,7
6,5,7,5,4 84,73,67,65,55 9,9,9,9,3,3
d Outcome Frequency e Outcome Frequency f Outcome Frequency

1 6 48 6 12 2
2 9 49 11 13 15
3 11 50 27 14 43
4 7 51 15 15 67
5 2 52 8 16 27
53 3 17 8

IEJ a Barbara's bowling average after 9 games is


178. If her next three games are 190, 164
and 216, what is her new average (mean)
based on the 12 games she has bowled?
b Rob bowls in a competition where 4 games
are bowled in each night of competition.
His average (mean) is 187. In his first two
games he scores 191and163.What must he
total in the last two games if he wants his
average to stay at 187?
B a Copy and complete this table and then
Outcome Frequency fx
determine the mode, mean and median ·- -

for this set of data. 5 1


b How many scores were greater than the mean? 6 4
How many were less than the mean?
c What is the range of the scores? 7 6
d Draw a frequency histogram and polygon for 8 7
this data.
9 5
10 2
Total:

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


This table shows the number of each type
Type of car Frequency
of car that passed by Jenny's house in an

hour. This is an example of categorical Ford 12 primary )ource a>


Jenny collected
data. (The other questions in this exercise
Holden 15 it Jier>elf.
involve numerical data.)
a What type of car is the 'mode', i.e. the Hyundai 13
one that had the highest frequency? Mazda 20
b Can you determine the mean type of car?
c Can you determine the median? Nissan 9
d What type of graph would best Toyota 17
represent this data?

II This frequency polygon represents a survey of all families in Allyson Street.


a Using this data, complete the table.
10
Children Frequency fx 9
8
0
7
1 6c 6
Q)
::l 5
0-
2 4
LL

3 3
2
4 1
5 0 1 2 3 4 5 6 7
Number of children
6

b How many families were surveyed in Allyson Street?


c How many children lived in Allyson Street?
d What was the most common number of children per family (mode)?
e If the national average number of children per family is 2·0, is the average number of
children per family in Allyson Street above or below this national average?

For each set of data below determine the:


• •
1 range ii mode iii ni.edian IV mean

a Scores in Maths test out of 30

• •
•• •• •• •• ••
• • • •
16 18 20 22 24 26 28 30
Score
It'>
b Time taken (minutes) to complete a bicycle race: cycle-_.
logical!
Stem Leaf """
3 8
4 269
5 3367
6 0133478
7 46689
8 3489

1' Statistics
II Each golfer in a tournament completed three
Class Class Frequency f x c.c.
rounds of golf. The scores for each round
centre (f)
were tabulated in this grouped frequency
(c.c.)
distribution table.
a Copy and complete this table. 64-66 65 25
b How many golfers competed in this 67-69 74
tournament?
c What was the maximum possible range? 70-72 73
d What is the modal class? 73-75 24
e Use the class centres to find an
76-78 2
approximation for the mean.
f Determine the median class. Total: • • • • • •

g Construct a frequency histogram.


D A test of 20 spelling words was given
to one hundred students. These marks Number correct
are the results. 17 19 15 13 20 19 15 16 14 20
a Tabulate this data in a frequency 18 19 15 17 19 12 9 20 19 16
distribution table. 12 14 14 18 16 19 20 19 18 14
b Draw a frequency histogram. 10 9 18 16 15 11 15 16 19 20
c Determine the mean number of 20 13 14 17 17 19 18 14 15 17
spelling words answered correctly. 18 20 20 17 19 12 11 17 16 19
d If Kylie spelt 17 words correctly, was 17 20 19 16 13 17 15 15 20 20
she above or below the mean? 12 14 20 19 17 18 14 18 18 12
e If this information is to be displayed 9 16 17 19 20 17 19 17 20 19
as a grouped frequency distribution, 17 15 14 20 18 13 14 15 19 18
would it be best to use a class interval
of2,5or10?
f Display the information above as a grouped frequency distribution using class intervals of
i 2(9-10,11-12, ... , 19-20) ii 5(6-10,11-15, 16-20) iii 10(1-10,11-20)
Explain how changing the grouping of the data has changed the shape of the display.
ll!J Year 9 students decided to sell chocolates to earn sufficient Number sold
money to buy a Christmas present for each patient in the
7 0 35 14 22
local nursing home. The list on the right shows the number of
17 30 11 5 29
chocolates sold by each student.
26 20 12 24 15
a Prepare a grouped frequency table using a class interval of
10 16 32 39 28
i 2 ii 5 iii 10
19 28 11 24 30
b Draw a frequency histogram using the table with class interval:
21 32 18 21 4
i 2 ii 5 iii 10
30 19 6 20 35
c From your results so far, choose the most appropriate class
38 26 23 8 37
interval to display these scores. Give reasons for your choice.
m Mario took a systematic sample of students from Year 9. He identified every tenth student on
the alphabetical roll and recorded their English and Maths percentage test results.
Maths: 72,63,87,94,55,46,66,81,62,84,97,59,75,77,49,57,68,77,51,70
English: 61, 39, 52, 45, 79, 59, 51, 63, 71, 75, 66, 60, 53, 48, 59, 68, 61, 72, 46, 59

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


The Maths marks have already been entered into this
ordered back-to-back stern-and-leaf plot. (Maths) Leaf Stem Leaf (English)
a Copy the plot and enter the English marks. 3
(Remember to order them first.) 96 4
b For each set of marks, determine the: 9751 5
i range ii median iii mean 8632 6
c One particular student scored 77 for Maths but 77520 7
only 75 for English. Can you say this student is 74 1 8
better at Maths than English? Justify your answer. 74 9

IE Year 9's exam results have been organised into an ordered stern-and-leaf plot using a class size
of 5 as shown.
a Complete the frequency column and use it to determine the modal class.
b Complete the 'class centre'
Stem Leaf f c.c. f x c.c.
and 'frequency times class - ,_

6 (5) 88999 67
centre' columns. Use the totals
7(0) 000012234
to calculate an approximation
7(5) 555566777788
for the mean.
8(0) 00011112344
c Determine the actual median
8(5) 58
for these results using the leaf
9(0) 00012334
column.
9(5) 789
d Determine the median class
using the frequency column. Totals: • •
• • • •

How does this compare with


the answer to part c?
For this question you will generate the
Class c.c. Tally f f x c.c.
data yourself by using the random number
generator on your calculator. Each element 0-9 4.5
of data will be a random number squared. 10-19 14·5
a Press the following keys:
20-29
(SHIFT) (Ran# ) G G 24·5
This will give a 6-digit decimal such 30-39 34·5
as O· 283 024.
40-49 44·5
Now take the first two decimal places
and record this as a number between 50-59 54·5
00 and 99 in the table. Do this a total 60-69 64·5
of 100 times.
b Determine: 70-79 74·5
i the modal class 80-89 84·5
ii the median class
90-99 94·5
iii an approximate value for the
mean using the class centres. Total: • • • • • •

c Construct a frequency polygon.


d Compare the shape of your graph with others in the class. Do you notice any consistent
pattern? What reasons are there for any pattern in the data?

1' Statistics
INVESTIGATION 14:01 ADDING AND AVERAGING
810 ...... - =SUM 82:88
A B c D E F G H I J K L M N 0
1 DAY TAKINGS DAY TAKINGS DAY TAKINGS DAY TAKINGS DAY TAKINGS
2 27-0ct $2,490 3-Nov $3,260 10-Nov $2,800 17-Nov $2,570 24-Nov $2, 170
3 28-0ct $4,360 4-Nov $4,040 11-Nov $4,690 18-Nov $2,920 25-Nov $3,640
4 29-0ct $1,440 5-Nov $1,420 12-Nov $1,520 19-Nov $2,360 26-Nov $1,420
5 30-0ct $1,660 6-Nov $1,960 13-Nov $1,340 20-Nov $1, 100 27-Nov $1,350
6 31-0ct $1,370 7-Nov $1, 180 14-Nov $1,900 21-Nov $1 , 170 28-Nov $1,480
7 1-Nov $1,430 8-Nov $1,230 15-Nov $1,440 22-Nov $1, 700 29-Nov $1,350
8 2-Nov $1,860 9-Nov $1,510 16-Nov $1,660 23-Nov $1 ,550 30-Nov $1,440
9
10 $14,610
11
12

The calculating power of a spreadsheet makes it an extremely useful statistical tool.

Part of a spreadsheet is shown above. It shows the daily takings for the Lazy Lizard Cafe for the
5-week period from 27 October until 30 November.
• Enter this information into a spreadsheet.

Note the formula ==SUM(B2:B8) has been entered in cell B10.This gives the weekly takings by
adding the numbers in cells B2 to B8.

1 Write formulas to give the weekly takings for the other 4 weeks.

2 Now write a formula that could give you the total takings for the 5-week period.
Each row of the spreadsheet gives the takings for the same day of the week. For example, tl1e
days are all Saturdays in row 2 and all Sundays in row 3.

The formula ==AVERAGE(B2,E2,H2,K2,N2) will calculate the mean of the numbers in


cells B2, E2, H2, K2 and N2. Typing this formula into cell 02 will give the average sales
for Saturday.
• Copy this formula into cells 03 to 08 using 'Fill Down'. Use the results to find the average
sales for each day of the week from Saturday to Friday.

3 Which day of the week has the highest average takings?

4 Which day has the lowest average takings?

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


The previous section reminded you how to sort data into a frequency distribution table, and how
to analyse the data using various statistical measures. The data could also be displayed in the form
of a frequency histogram or polygon.
A further column that may be attached to the frequency distribution table is the 'cumulative
frequency' or cf column. This column gives the progressive total of the outcomes.
This column has been added to the frequency distribution table below.

WORKED EXAMPLE
Outcome Tally Frequency Cumulative For a class of 26 students the foil owing
(x) lf) frequency (cf) marks out of 10 were obtained in a test.
• 15 students scored 6 or less. Since
3 II 3 3
4 students scored 7, the cun1-ulative
4 3 +----- 6 frequency of7 is 15 + 4 or 19.
5 4
• The last figure in the cf column must
+----- 10
be equal to the sum of the frequencies,
6 -
- 5 +----- 15 as all students are on or below the
7 4 19 highest outcome.
8 2 21
9 II 3 24
10 2 26
Total: 26 26

The cumulative frequency can also be displayed 24


in the form of a histogram or polygon. However, 22
there are so1ne differences, as noted below.
20
• The histogram progressively steps upwards to Histogram-----.....--¥
18
the right.
• The polygon is obtained by joining the top
Polygon
right-hand corner of each column. (Why is
it drawn this way?)
• Imagine that the column before the -
'3' column has zero height. E 10
()

8
How abovt
6
tl-iat!
4

Another name for the 2


cumulative frequency
0
polygon is the 'ogive'. 3 4 5 6 7 8 9 10
Class test marks

1' Statistics
Finding the median from a frequency distribution table
The cumulative frequency can be used to find the median of a set of scores.

WORKED EXAMPLES
Outcome Frequency Cumulative The middle score is the 15th score
1
(14 above it and 14 below it).
[x] lf) frequency
The 15th score is a 5.
3 5 5 .·. Median= 5
4 3 8 /
5 7 15

6 8 23

7 4 27
8 2 29

2 Outcome Frequency cf Here, there is an even number of scores (22)


so the middle two scores are the 11 th and
[x] lf)
12th scores.
5 2 2 From the cf column it can be seen that each
6 4 6 of these scores is 8.
:. Median= 8
7 3 9 /
8 7 16 .¥

9 5 21

10 1 22

3 Outcome Frequency cf Here, there is also an even number of scores


[x] lf) (30) so the middle two scores are the 15th
and 16th scores.
5 6 6 - In this example, the 15th score is 6 and the
6 9 15 16th score is 7. The median is the average
of these two scores.
7 5 20 ¥
:. Median= 6·5 (or 6
8 4 24
9 3 27

10 3 30

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


Finding the median from an ogive
The cumulative frequency polygon, or ogive, can be used to find the median. Note that the
method used is different from the method used for the frequency table.

WORKED EXAMPLE 1 26
24
To find the median, follow these steps.
22
• Find the half-way point(! X 26 = 13).
20
• Draw a horizontal line from this point to the ogive. >. 18
u
• Then draw a vertical line to meet the horizontal axis. c
Q) 16
::J
• This meets the horizontal axis within the '6' column. a-
Q) 14
lo...
'+-
:. The median is 6. Q)
>
12 I
I
·-
+-' I
co
- 10 I
::J I
E 8 I
I
::J
() I
6 I

4 '
I
I
I
2 I
I
0
3 4 5 6 7 8 9 10
Outcome

For grouped data, the outcomes are grouped into classes with a representative class centre (c.c.).
The horizontal axis usually shows these class centres and the median class is found as above.
Alternatively, an estimate for a median could be read from the horizontal axis, as shown below.

WORKED EXAMPLE 2
The percentage results for sixty students in an examination are given in this table.

Class Class centre Tally Frequency f x c.c. cf


(c.c.) lf)
29-37 33 2 66 2
38-46 42 -
- 5 210 7
-
47-55 51 - I 8 408 15
56-64 60 - -- - I 12 720 27
65-73 69 - -
I 14 966 41
- -
74-82 78 - 9 702 50
- I
83-91 87 - - I 7 609 57
92-100 96 II 3 288 60
Total: 60 3969

1' Statistics
When constructing frequency diagrams for grouped
60
data, the only point to note is that the columns are 56
indicated on the horizontal axis by the class centres. 52
The diagram for the worked example above would 48
look like this. 44
>.
• The cumulative frequency polygon can be drawn g
Q)
40

by joining the top right corners of each column.


5-
Q)
36
32 _____ .,... ______ 11
• There are 60 scores altogether, so to find the median Q)
>
....... 28
·- I
class we come across from 30 until we meet the -::Jrv 24 I
I
E I
polygon and then down to the horizontal axis. ::J 20 I
0 I
I
• Clearly the median class is 65-73. 16
12
TI
• An estimate of the median mark can be read from I
8 I
the horizontal axis, i. e. 6 7. I
I
4 I
I
0
33 42 51 60 69 78 87 96
Exam marks

I like to accumulate!
The cumulative frequency of an outcome gives
the nun-iber of outcon-ies equal to, or less than, BANK 8ooK

that particular outcome. $b 1:,


.s \0 \b
$7

D Calculate the total of the frequency column and complete the cumulative frequency column
in each of these tables.

a Outcome (x) f cf b Outcome (x) f cf


0 3 3 9 1

1 8 11 10 13

2 11 22 11 22

3 17 12 30

4 9 13 21

5 2 14 13

Total: Total:

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


c Outcome (x) f cf d Outcome (x) f cf
x is- the
0 1 15 4 'outcome',! is- the
1 0 20 8 'frequency J .
f;I , /j,
2 3 25 3 .

3 8 30 9
4 14 35 7

5 20 40 10
6 31 45 15
7 32 50 8
8 28 55 10
9 11 60 2
10 5 65 4

Total: Total:

Use the cu1nulative frequency histogram shown 20


to complete the table below. 18
16
Outcome Frequency (J1 fx cf
>.
()
cQ) 14

5 5-12
Q)
3.--

10
6 >
·-
...... 8
-::::i
C'CS

6
7 E
::::i
0 4
8 2
0
9 5 6 7 8 9 10
Score
10
Total: I.f = I.Jx=
From the table, determine the:
To find the;
a mode
mode us-e the
b mean
f column
c median
median us-e the
d range c.f: column
mean us-e the f,x
and f columns-.

1' Statistics
El Five coins were tossed many times and the number of
20
heads recorded. The cumulative frequency for each
number of heads was calculated and a cumulative 18

frequency graph was drawn. 16


a How many times did zero heads occur? >- 14
(.)
b How many times did 1 or less heads occur? c
Q)
::l 12
c How many times did 2 or less heads occur? O"
Q)
""-

d How many times did 3 or less heads occur? >


10
·-
+-'
e Think about your answers to c and d. -CU::l 8
How many times did 3 heads occur? E
::l
() 6
f How many times were the 5 coins tossed?
4

0 1 2 3 4 5
Number of heads

Sharon organises her family's football tipping competition. Each week Dad, Sharon, Adam and
Bron have to pick the results of the 7 rugby league matches played. The table below shows the
results for rounds 1 to 5.

Sharon Adam Dad Bron


Score Prag. Score Prag. Score Prag. Score Pro g.
Total Total Total Total

Round 1 4 4 6 6 4 4 4

Round 2 3 7 3 9 4 3 7

Round 3 6 13 4 13 6 5 12

Round 4 4 17 4 17 2 3 15
Round 5 5 22 4 21 5 4 19

a 'Prog. Total' is short for 'Progressive Total'. It is like a cumulative frequency column.
Complete Dad's Prog. Total column.
b Who was leading the competition at the end of
i round 1 ii round 3 iii round 5?
c What has been the highest score achieved in a round? How many times has this happened?
d Who has had the lowest score in a round?
e In round 6 the scores were: Sharon 6,Adam 6, Dad 7, Bron 5. Use these results to add the
next line in the table.

Two dice were rolled one hundred times and the total showing on the two upper faces was
recorded to obtain this set of scores.

5 7 6 12 10 2 4 5 7 9 7 6 4 3 5 8 6 3 5 6
5 8 7 9 6 8 9 4 8 7 8 4 8 4 8 7 6 7 10 5
9 5 6 5 2 9 5 9 11 10 6 7 7 7 10 6 11 10 7 8
8 3 9 3 5 8 7 12 10 9 7 8 7 5 6 4 5 8 9 11
10 6 9 6 7 8 9 10 11 3 6 4 7 2 4 8 8 4 6 7

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


a By completing a frequency distribution table, deter1nine the mode and mean of the set
of scores.
b Complete a cumulative frequency column and use it to find the median.
c Look at the cuni_ulative frequency column to determine how ni_any scores were less than
the median.

II In the game of golf, a par is the number of strokes allocated to complete a given hole. Holes
can only have a par of 3, 4 or 5 strokes. If a par is not scored, the score is said to be either
under or over par. Different holes can be rated for difficulty by analysing players' scores on
the hole.
The tables below show the scores achieved by all the players in a recent British Open on
two holes.

Sth Hole Par 4


Player's score Frequency Cumulative
frequency
3 54
4 314
5 85
6 3

7th Hole Par 5


Player's score Frequency Cumulative
frequency
3 20
4 211
5 198
6 23
7 4

a Complete the cumulative frequency for each hole.


b For the 5th hole:
i how many players scored par or better (4 or under)?
ii how many players scored above par?
c For the 7th hole:
i how many players scored par or better (5 or under)?
ii how many players scored above par?
d What percentage of players scored above par on the 5th and 7th holes respectively? What
does this indicate about the difficulty of the respective holes?
e Draw a cumulative frequency polygon for both sets of data. How does the shape of the
polygon indicate the degree of difficulty of the hole?

1' Statistics
The table below shows the players' scores for the second round in the same British Open golf
tournament. The par for the course (sum of the pars for all 18 holes) is 71.

Score 68 69 70 71 72 73 74 75 76 77 78 79
Frequency 1 0 6 9 13 16 8 8 8 3 0 1
Cumulative
frequency

Complete the cumulative freqt1ency column and use it (or some other m ethod) to answer the
following questions.
a How many players scored under par (lower than 71)?
b How 1nany players scored par or better (71 or lower) ?
c How many players scored worse that par (higher that 71)?
EJ Use the ogive to find the m edian from each graph.
a Mark out of 5 b Mark out of 14 c Mark out of 10
-

I
18 36 L 18
16 32 16
>. >.
g 14 I- (.) 28
12 -
[7 c
24
0- 0-

- 10
8
- 16 20
1
·-
+'""
6 I-
7 ·-
+...r

12
::J
E
::J
4 - / ::J
§ 8
0 2 1-- 0 4
I I I I I I
0
0 1 2 3 4 5 9 10 111213 14 4 5 6 7 8 9 10
Outcome Outcome Outcome
d Mark out of 8 e Mark out of 8 f Mark out of 8

27 22 27
24 20 >.
(.)
24
>. >. 18
(.) 21 t (.) a3 21
c a3 16 ::J

18 ::J g 18
0- 0-
Q)
14
-> 15
- 12 15
Q)
_+...r
12
;::: 10
Q)

·-
12
·-
+...r
CU ::J
-::JCU 9 t -::J 8 E 9
::J
E 6 E 6 0 6
::J ::J
0 0
3 4 3
0 2 0
3 4 5 6 7 8 4 5 6 7 8
0
Outcome 4 5 6 7 8 Outcome
Outcome

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


D The number of cans of drink sold by a shop each day is shown in this table of grouped data.
a Construct a cumulative frequency histogram and ogive.
b Deter1nine the ni_edian class.

No. of cans c.c. Frequency cf


sold (x) (f)
16-22 19 7 7

23-29 26 18 25
30-36 33 18 43
37-43 40 15 58
44-50 47 8 66
51-57 54 4 70

ID] From this frequency histogram: 14
,_

,_
a determine what the class groupings must have been if the 12

data are discrete whole numbers G


c
10
,_

,_
8
b construct a cumulative frequency histogram and ogive g 6 ,_
la...

c determine the median class. u_


4 ,_

,_
2
I I I I I
0
10 19 28 37 46
Results

m From this cuni_ulative frequency diagram:


a determine the median class 30
b complete the table below to show the frequency of 27

each class
G
c
24
21
CJ
(l)
c.c. 2 12 22 27 32 18
7 17 ,,.__
la...

(l)
> 15
·-
_.

-
f co 12
-::::J
E 9
::::J
c determine the modal class ()
6
d calculate the mean. 3
0
2 7 12 17 22 27 32
Outcome

1' Statistics
IE The exam results for Year 9 students have been collated in this grouped frequency distribution table.

Class c.c. Tally Frequency (f) f x c.c. cf


1-10 5.5 - ""'" 9 49·5 9
11-20 15·5 - - - 15 232·5 24
- - -
21-30 25·5 - -
""'"
-
""'"
IIII 19 484·5 43
31-40 35·5 -
'-

-
'-

-
'-

I 17 603·5 60
41-50 45·5 -
i.-

-
'-

-
i.-

-
i.-
20 910 80
51-60 55·5 - ""'"
- ""'"
- ""'" 17 943·5 97
61-70 65·5 - ""'"
- ""'"
- ""'"
.....
""'"
I 21 1375·5 118
71-80 75.5 - - - 16 1208 134
- - -
81-90 85·5 -
i.-

-
""'"
-
i.-

II 18 1539 152
91- 100 95·5 -
i.-

II 8 764 160
Total: "Lf == 160 "L(jx c.c.) == 8110
a What is the greatest possible range for this data?
b What is the modal class?
c Calculate a value for the mean.
d Construct a cumulative frequency histogram and ogive.
e From your graph, determine the median class.
f Considering part e, what would be a reasonable single numerical value for the median?
g What percentage of students obtained: i more than 80 ii 20 or less iii more than 50?

Weather bureaus around the world keep statistics on many aspects of weather.
Average rainfall and average temperature are often quoted in weather broadcasts.

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3




PREP QUIZ 14:03


Find the range of the scores in:
Set A: 8, 8, 8, 8, 8 SetD: 8,8,8,8,8
1 Set A 2 Set B 3 Set C
Set B: 7, 8, 8, 8, 8 Set E: 7, 8, 8, 8, 9
4 Which set contains an outlier?
Set C: 1, 8, 8, 8, 8 Set F: 6, 7, 8, 9, 10
5 Is the range affected by an outlier?

Find the mean in:


6 Set D 7 Set E 8 Set F Score> that are
ul)u)ually hi9l-t or low
9 Which set of scores is the least spread out?
are called outlier).
10 Which set of scores is the most spread out?

• So far we have concentrated on finding the measures of central tendency: mode, mean and
median. These values tell us how the scores tend to cluster.
• We have used the range as a measure of spread. But, as seen in the Prep quiz above, the range is
easily affected by an outlier.
•A much better measure of spread than the range is the interquartile range (IQR). This is the
range of the middle 50% of scores.

WORKED EXAMPLE 1 ••
Find the interquartile range of the scores: •
• • • • •
••••
1,2,2,5,7,9, 10, 10, 11, 11, 11, 11 1 2 3 4 5 6 7 8 9 10 11 12
Score
Method 1
• Make sure that the scores are in ascending order.
• Divide the scores into four equal groups. (This is not always possible. See Worked Example 2.)

1, 2, 2, 5, 7' 9' 10, 10, 11, 11, 11, 11,


t t t
' ,I \. /
'V' v

lst quartile Median 3rd quartile


(25th percentile) (50th percentile) (7 5th percentile)
2+5 Q2 = 9+10 = 9·5 11+11
Q1 == == 3·5 Q3= 2 =11
2 2

The lst quartile (Q1) is 3·5, which lies half-way between 2 and 5.
The 2nd quartile (inedian) is 9·5, which lies half-way between 9 and 10.
The 3rd quartile ( Q3) is 11.
• The interquartile range is the difference between the 3rd and 1st quartiles.
Interquartile range == Q3 - Qi
== 11 - 3· 5
== 7.5

1' Statistics
Method 2
• Construct a cumulative frequency polygon.
• Co1ne across from the vertical axis to the polygon from positions representing 25%, 50% and
75% of the scores. Take the readings on the horizontal axis to obtain the 1st quartile, median
and 3rd quartile.

x f cf 12 Polygon1 I..
1 1 1
2 2 3 ____________________________ .,.,
9 of 12 ..
4
3 0 3
4 0 3
5 1 4 6 is ; of 12 .. (].) 6
·->
.........
6 0 4 '.:j

E
7 1 5 3 is of 12 .. c3 3
8 0 5
9 1 6
10 2 8
11 4 12 Score

'Lf == 12

The lst quartile ( Q 1) is somewhere in the range 2· 5 to 4· 5, Quartile i> >imilar to tlie
as the polygon has the height 3 for those values. We resolve quarter. Quartile>
this problem by taking the average of 2·5 and 4·5. divide tlie data il'\to four
equal 9roup>.
2·5 + 4·5
:. Q1 = = - - -
2
== 3·5

The median (Q2) is seen to be 9·5.

The 3rd quartile ( Q3) is seen to be 11 (or 1O·7 5 if we consider


the horizontal scale to be continuous rather than discrete).

The interquartile range == Q3 - Q1


==11-3·5
== 7.5

The interquartile range is more useful when the number of scores is large. When the
number of scores is small (e.g. ? ) , it is hard to define 'the middle half of the scores'.

The interquartile range is:


• the range of the middle 50% of the scores
• the difference between the points below which 75% and 25% of scores fall
(the difference between the third and first quartiles)
• the median of the upper half of the scores minus the median of the lower half.

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


WORKED EXAMPLE 2
Find the interquartile range for the following sets of scores.
Set A: 1, 2, 2, 5, 7, 9, 10, 10, 11, 11, 11
Set B: 1, 2, 2, 5, 7, 9, 10, 10, 11, 11

Solution
When the number of scores in a set is not a multiple of 4,
they cannot be divided into 4 equal groups.

Set A has 11 scores. Hence, the 1, 2' @ 5' 7' @ 10' 10' @, 11, 11
middle score, 9, is the median ( Q2).

The middle score of the bottom 1st quartile Median 3rd quartile
5 scores is Q 1 . Ql == 2 Q2 == 9 Q3 == 11
The middle score of the top The interquartile range == Q3 - Ql
5 scores is Q 3 . ==11-2
== 9

Set B has 10 scores. Hence, the median 8


is between the 5th and 6th scores. 1, 2, @ 5, 7, 9, 10, @, 11, 11

This divides the scores into two groups t t t


of 5 scores. lst quartile Median 3rd quartile
7+9
The middle scores of the bottom and Ql == 2 Q2 == Q3 == 10
2
top groups are Q1 and Q3 respectively.
== 8

The interquartile range == Q3 - Ql


== 10 - 2
== 8

r:I Foundation worksheet 14:03


Exercise 14:03 . . . Interquartile range

Use Method 1 (Worked Example 1 on page 431) to find the interquartile range of each set of
scores. (Rewrite the scores in order as the first step in each case.)
a 6,4,3,8,5,4,2,7
b 1,5,2,6,3,8,7,5,4,5,7,9
c 60,84,79,83,94,88,92,99,80,90,95,78
d 15,43,30,22,41,30,27,25,28,20, 19,22,25,24,33,31,41,40,49,37
e 56,83,60,72,61,52,73,24,88,70,57,63,60,48,36,53,65,49,62,65

1' Statistics
fl The scores of 32 students have been used to graph 32
. '
..... _/".
Cumulatiwe
this cumulative frequency histogram and polygon. frequency
Use the graph to find: ..... - histogram
/
28
and polygon
a the median, Q2 -- .
b the 1st quartile, Q 1 ... - - -

c the 3rd quartile, Q3 -- '" -


d the interquartile range, Q3 - Q1 . 1/ 1
(Note: Here the answers are whole numbers.)
·-
-:J
.....
E 12
:J
0
® - ,.. -
[7
4 .....
bv
I I I I I I I I I I I I --
7 8 9 101112131415161718
Score

EJ The same 32 students sat for a second test. 32


The results have been used to draw this graph. Cumulative
frequency
Use the graph to find: 28 histogram
a the median, Q2 and polygon
24
b the 1st quartile, Q1 >.
(.)
c
c the 3rd quartile, Q 3 20
0-
d the interquartile range, Q3 - Q1 . (])
1'-
'+-
(]) 16
(Note: Here some answers will involve decimals.) >
·-
+-'
ro
:J
E 12
:J
0
8

0 1 2 3 4 5 6 7 8 9 10
Score

Make up a frequency distribution table for these scores.


7 8 6 9 4 6 5 5 4 2 3 7 6 6 5 8 4 5 6 4
7 6 8 5 3 4 8 9 6 5 4 5 7 3 6 6 5 5 5 6
Use your frequency distribution table to find:
a the interquartile range using Method 1 b the interquartile range using Method 2.
Use a cumulative frequency polygon to find the interquartile range for each of the following.

a Marks Frequency b Times Frequency

16 3 35 3
17 4 36 4
18 5 37 7
19 5 38 10
20 3 39 18
40 18

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


II Find the interquartile ranges of the following sets of scores. For QveS'tion G, S'ee
a 25,45,46,50,58,58,65,66,70,90 Worked Example 2.
b 25,25,26,26,26,28,29,30,30,32,32
c 45,45,56,56,58,59,59,59,80

Use the cumulative frequency polygons to find the interquartile range of each set of scores.
a 30 b 30 - I

-
-

24 24 -
>. 22·5 >. - '
() ()
c c - ..

<D <D
::J ::J
0- 18 0- 18 -
<D <D
'+- '+- -
<D 15 ..--.. <D
>
>
·- ·-....... -
('\j
-::J 12 "S 12 -
E E '7
::J ::J - - '
0 0
7·5 -
6 6-
- ·-

-7. I I I I I I I

1112131415161718 1 2 3 4 5 6 7 8
Score Score

D A cumulative frequency polygon (ogive) can


also be used to obtain the interquartile range .....
-
128
for grouped data. The weights of 128 boys 120 I- -
were measured to the nearest kilogram and -
grouped in classes of 50-54 kg, 55-59 kg
and so on up to 85-89 kg. >.
100 ,-
()
c
Use the ogive to esti1nate the following. 80
0-
a 1st quartile Q)

b 3rd quartile
'+-

·- 60 I
c interquartile range.
.......
('\j
-::J I -

E
::J 40
[
0
-
20
I-
I I I I I I I I
0 52 57 62 67 72 77 82 87
Weight (kg)

1' Statistics
D Find the quartiles for each of the fallowing sets of data and then find the interquartile range.
(Note that in both the dot plot and the stem-and-leaf plot, the scores have already been
arranged in order.)
a • •
•• •• •• •• •• b
Stem Leaf
,.

I
• I
• •
16 18 20 22 24 26 28 30 3 8
Score 4 269
5 3367
6 0133478
7 46689
8 3489

WHY DID THE ROBBER FLEE FROM THE MUSIC STORE?


Work out the answer to each part and write the letter for that part
x f cf
in any box that is above the correct answer.
1 4
In a coordination test, 12 students were rated 1 (poor) to
6 (outstanding). 2 2
The results are shown in the frequency table. 3 1
What is: 4 2
A the mode E the median
E the range E the highest score 5 1
F the mean R the fraction of scores that are 6 6 2
F the cumulative frequency of 5 H the cumulative frequency of 6?
How many people:
H were rated as outstanding
H were rated higher than 6
I were rated less than 4
L were rated poor?

One of these students is selected at random.


What is the probability that the student's rating is:
N 3 0 1
T less than 3 U less than 5
W anything but 3?

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3



In Stage 4, the dot plot and stem-and-leaf plot were used to illustrate certain aspects of a set of
scores or distribution.
Another type of display is the box-and-whisker plot, or more simply, box plot. This is drawn using
a five-point summary of the data as shown below.

1 The minimum score


' '
2 The first quartile, Q 1 jI ' '

3 The median, Q2

®
4 The third quartile, Q3
5 The maximum score @ ®

In a box plot:
• the box shows the middle 50% (the interquartile range) between Q 1 and Q3
• the whiskers extend from the box to the highest and lowest scores
• the whiskers show the range of the scores.

WOR·'KED EXAMPLE 1
• �- - •. • : - •

_. • ' •• --. • • •• • •" "• "• ·, • C > '- 0 '•• • ' • •

The scores in an assessment task for a class were as follows.


40 71 74 20 43 63 83 57 63 26 43 87 74 89 66 63
Find the five-point summary of these marks and use it to construct a box plot.

Solution
Rearrange the scores in order and find Q2 , then Q 1 and Q3 .
20 26 40 43 43 57 63 63 63 66 71 74 74 83 87 89
t t t
The five-point summary is (20, 43, 63, 74, 89).
Use the five-point summary and a suitable scale (1 mark := 1 mm) to construct the box-and­
whisker diagram or box plot.

20 30 40 50 60 70 80 90
Score

14 Statistics
Use the box plot to find th,e:
a range
b interquartile range
-
c median
d percentage of scores above 60
.. tt r -1
T
t I - t -, t µ b
' �

1- -1 i I II
20 30, 40 50 601 70 80
e percentage of scores below 36. Score

Solution
a Range = maximum score - minimum score b Interquartile range = Q3 - Q 1
= 74 - 25 = 60 - 36
= 49' = 24
c Median == 54 d As Q3 == 60, 25% of the scores are above 60.
e As Q 1 = 36, 25% of the scores are below 36.

D Use each box plot to find the:


••
i median 11 range iii interquartile range.
-
- t tII l t { I I

a -

-- -
' f- I t tt tI
tI l
I•
�- >- ,_

b
'
'

I- - .
-
-

t f1- - ''
''
C '
'' ,-
·- ·-
·- '

... I l I I I I
II
I I

j
I
I I
·-
I
,.
----
I '•
40 50 60 70 80
Score

Find the five-point summary for each of the following sets of data and use it to construct a
box plot.
a 7,7,8,8,8,9,9,9, 10, 12, 12, 12
b 16,24,25,25,26,28,28,28,28,30,32,33,34,34,37,38
C 14,19,29,36,40,43,43,44,46,46,47,49

IJ _Find the five-point summary for each of the


following sets of data and use it to construct
a box plot.
a 43,37,42,48,39,39,40,40,44,47,45,44 A dot plot or
b 75,78,63,59,68,72,74,83,87,86, stem-and-leaf plot is
59,75,82,82,84,85,77,76,70,83 helpful when you have
.-
to sort unordered data.

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5�3


B Th.ese double box plots represent the distance travelled to school by members of
Year 9 and Year 10.

------- Year 1 10

Year9

11 I I I I I I I I I I •
2 4 6 8
10 12 14 16
Distance travelled (km)

a What percentage of Year 9 students travel:


i farther than 7 km "i farther than 5 km?
b What percentage ofYear 10 students travel:
i farther than 7 km ii farther than 5 km'?
c Find the interquartile range for:
i Year 9 ii Year 10
d Which group does more travelling?

II a Use the dot plot to find the five-point •


• •
summary for the scores.
•• •• •
• • •
b Construct a box plot for the scores.
• • • • • • • •
17 18 19 20 21 22 23 24
Score

II The marks of24 students in a half-yearly test are


Test scores
recorded in the stem-and-leaf plot.
a Find the five-point summary for these marks. Stern Leaf
b Construct a box plot for the marks. 2 678
3 5889
4 0 1 389
5 25677
6 799
7 55
8 22

Ray and Ken play 40 games of golf over a 1-year period. Their scores are shown on the
double box plots below.
t--------- Ray

Ken

11 I I I I I II I -
72 74 76 78 80 82 84 86 88
Score
a What is the five-point summary for Ken's scores?
b Which golfer's scores have the smaller range?
c Which golfer's scores have the smaller interquartile range?
d Given your answers to b and c, which golfer do you think is the n1-ost consistent?
Give a reason for your answer.

14 Statistics
El Rick recorded how long it took him to drive to work over 28 consecutive days. The times
taken to the nearest minute are shown in the frequency table.

Time (minutes) 38 39 40 41 42 43 44 52

Frequency 1 2 6 7 5 4 2 1

One year later, after the addition of traffic lights and other traffic ni_anagement measures,
Rick repeated the process and obtained the following results.

Time (minutes) 38 39 40 41 42 43 45

Frequency 1 4 8 9 4 1 1

Draw double box plots to illustrate the before and after results and use them to corrunent on
the effectiveness of the traffic changes.

INVESTIGATION 14:04 CODE BREAKING AND STATISTICS


Codebreakers use statistics to help decipher codes. They use the facts that certain letters are
more common than others. What letter of the alphabet appears most often? What is the most
common vowel? What consonant occurs most often?

1 Write what you think the answers are to the three questions above.

2 Use the statements above to do an alphabetic analysis. Were your answers in 1 supported by
the statistics?

-- ...
•••

As well as deciphering codes, mathematicians are often employed to devise security codes to prevent
access by unauthorised users. In particular, cryptographers are employed to stop computer hackers
from accessing computer records.

On one side we have mathematicians trying to break codes, and on the other side we have
mathematicians trying to design codes that cannot be broken.

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


Statistics are often used to look at the similarities and differences between sets of data. Here are
some examples.
• Teachers are often interested in comparing the marks of a class on different topics or
comparing the marks of clifferent classes on the same topic.
• Medical researchers could compare the heart rates of different groups of people after exercise.
• Coaches might compare the performances of different players over a season or the same player
over different seasons.
• Managing directors of companies could compare sales and profits over different periods.

As well as calculating the measures of cluster (the mean, median and mode) and the measures of
spread (the range and interquartile range), a comparison would usually involve using graphical
methods. Back-to-back stem-and-leaf plots, double-column graphs, double box plots and
histograms are useful ways of comparing sets of data.

Shape of a distribution
A significant feature of a set of data is its shape. This is most easily seen using a histogram or stem-
and-leaf plot. For some data sets with many scores and a large range, the graph is often shown as
a curve.

The graphs below show the results of 120 students on four different problem-solving tests.

Graph A Graph B
30 30

>. >.
g
(1)
20 g
(1)
20
:::J :::J
O" O"

LL 10 LL 10

4 5 6 7 8 9 10 x 4 5 6 7 8 9 10 x
Score Score

• Graphs A and B are examples of symmetric distributions.


• Graph A has one mode; it is said to be unimodal.
• Graph B has two modes; it is bi-modal.
• A unimodal symmetric distribution is quite common in statistics and is called a normal
distribution.
• Symmetric distributions are evenly distributed about the mean.

1' Statistics
Graph C Graph D
30 30
>. >.
g
Q)
20 g 20
Q)
::J ::J
O' O'
Q)
lo...

LL 10 LL 10

4 5 6 7 8 9 10 x 4 5 6 7 8 9 10 x
Score Score

• Graphs C and D are examples of skewed distributions.


• If most of the scores are at the low end, the skew is said to be positive.
• If most of the scores are at the high end, the skew is said to be negative.

WORKED EXAMPLE
Our class was given a topic test in which we
Test scores
performed poorly. Our teacher decided to
( 4 1 represents 41)
give a similar test one week later, after a
thorough revision of the topic. The results Test 1 Stem Test 2
are shown on this back-to-back stem-and-leaf 98660 3
plot. (This is an ordered display.) 9773111 4 36688
Compare the results of the class on the two 885330 5 17 99
tests. Note that two students were absent 98753 6 389
during Test 1. 7 055589
8 2677
0 9 0013
Solution
• The improvement in the second test is clear to see.
Test 1 Test 2
The medians, which are easily found, verify this, as
do the means. Median 49·5 72·5

Test 1IQR=60·5 - 41=19·5 Mean 51 ·5 69·8


Test 2 IQR == 86 - 57 = 29

• The spread of the scores in Test 1 is smaller than


Test 1 Test 2
in Test 2. The interquartile range confirms this.
Interquartile range 19·5 29
The presence of the outlier in Test 1 had made Range 60 50
the range an unreliable measure of spread.

The double box plots clearly show these features.


Test 2
Test 1 is positively skewed as more scores are at the
'low end' indicated by the median being lower than Test 1

the mean.
30 40 50 60 70 80 90 100
Test 2 is negatively skewed as more scores are at
the 'high end', indicated by the median being
greater than the 1nean.

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


The age distributions of students in four high schools are shown below.

J School A J School B

100 - 100-
80 -
80 -
60 - 60 -
40 - 40 -
20 - 20 -
I I I I I I I
I I I I I I I

12 13 14 15 16 17 18 x 12 13 14 15 16 17 18 x
Age Age

J • SchoolC J l
School D

200 - 100 -
160 - 80 ,_

120 ,_ 60 -
80 - 40 -
40 ,_ 20 -
I I I I I I I
-
I I I I I I I
--
12 13 14 15 16 17 18 x 12 13 14 15 16 17 18 x
Age Age

a Which schools' age distributions are skewed? What causes the skew?
b Which schools' age distribution is closest to being distributed evenly?
c In which school would the mean age of a student be:
i closest to 15 ii below 15 iii over 15 iv the largest?

The marks for two classes on the same test are shown in the dot plots below.

Class 1

•• •• •
•• • •• •• •• •• •• •• •
•••
82 84 86 88 90 92 94 96 98 100
Score

Class 2

•• •
•• •• •• • •• •• ••
• • • • •••
80 82 84 86 88 90 92 94 96 98 100
Score

a Which set of results is more skewed?


b By just looking at the dot plots, estimate which class has:
i the higher mean ii the greater spread of scores.
c Check your answer to part b by calculating the mean and range for each set of scores.

1' Statistics
El A school librarian was interested in comparing the number of books borrowed by boys and
girls. At the end of the year, she looked at the number of books borrowed by each child and
prepared the following graphs.

J J l

Boys' borrowings
......
Girls' borrowings 50 ,_
50
40 -
40 I-

......
30 I-
30

20 I-
20 ......

10 ,_ 10 -
I I I I I
I I I I I

1-10 11-20 21-30 31-40 41-50


1-10 11-20 21-30 31-40 41-50
Number of books borrowed
Number of books borrowed

a Describe the shape of the distribution for:


i the girls' borrowings
ii the boys' borrowings.
b What is the first impression that the shapes of the distributions give about the boys' and
girls' borrowings?
c Why do you think two grouped frequency histograms were used to display the results
instead of a back-to-back stem-and-leaf plot?
d What sort of distribution would result if the librarian combined the boys' and girls' results?

B The stem-and-leaf plot shows the marks of a class on two different topic tests.
a Which set of marks is nearly symmetric?
b Which set of marks has the smaller spread? What measures of spread can you use to
support your answer?
c Calculate the median and mean for each set of marks. What do they suggest about
the class performance on the two tests?

Class tests

To check the shape of the distribution Topic 1 Stem Topic 2


turn the stem-and-leaf plot side on.
65 3
930 4 4679
98430 5 4
98860 6 237
9844 7 0889
65 8 3588899
9 5

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


II Thirty students entered a Program in swimming
20
swimming program hoping to
.....
improve their swimming. Before 18
before after
,_
16
and after the program, students
14 ..... + • +

were rated as non-swimmer (N),


weak swimmer (W), competent G 12 - - ·-

c
10 -
swimmer (C), good swimmer (G) O"
8 -
or excellent swimmer (E). LL

6 ..... -
a What was the mode rating
4 ..... -
before the program?
2 -
I

b What was the mode rating


after the program? N w c G E
Rating
c Before the program, what
percentage of students were rated either non-swimmers or weak swimmers?
d After the program, what percentage of students were rated good or excellent swimmers?
e How would you describe the success of the program?

II In two problem-solving tests, 5 questions were given to a class. The scores are shown below.

Problem test 1
5134341122533112432443 2
Problem test 2
0 2 4 4 2 2 3 0 0 2 4 4 1 3 3 2 2 3 2 2 0 2 2
a Arrange the scores into a frequency distribution table and use frequency histograms to
display the data.
b Calculate the mean and median for each test. What do they suggest about the difficulty of
the tests?
c Both sets of scores have a range of 4. Which set of scores has the greater spread?
Give a reason for your answer.

1' Statistics
This box plot represents the heights of 30 Year 10 students. The histogram also represents the
heights of 30 students.
I I 6

,__

G5 ,__ ,__

130 140 150 160 170 180 a3 4


Heights (in cm) 6- 3 -
2 ,__
u.. 1 ,__
I I I I I I I I I -
-
137 142 147 152 157 162 167 172 177
Class centres (height in cm)
a What information is shown on the box plot?
b Could the information in the histogram represent the same 30 Year 10 students who are
represented in the box plot? Explain.
D A researcher tested two different brands of batteries to see how long they lasted. Her results
are shown in the double box plots below.
Use the double box plots to compare the performance of Brand X and BrandY

Brand X

t------ Brand Y

390 400 41 0 420 430 440 450 460


Length of time (h)

lfl Two groups of adults underwent a simple Heart rates


fitness test. One minute after undergoing
a period of strenuous exercise, their heart Group 1 Stem Group 2
rates were measured. 755 11
The results are shown in the back-to-back 999988776 12 556
stem-and-leaf plot. 66440 13 4688
a What does the shape of the stem-and-leaf 420 14 2366788
plot suggest about the data? 15 3557
b Calculate the median and interquartile 16 24
range for each group and use them to
compare the results for each group.
II!] A local council was interested in speeding up the time it took to approve applications to build
a house. It looked at the time taken in days to process 40 applications. After reviewing its
procedures and monitoring processes, it then looked at another 40 applications. The results are
shown below.

Before After
44 53 38 39 52 41 40 41 40 39 43 42 54 48 46 44
43 43 42 57 47 45 50 50 51 52 44 38 40 40 51 52
68 50 45 42 58 48 40 39 39 46 46 49 52 51 42 43
44 46 52 45 46 53 54 40 40 45 44 39 50 43 48 40
48 47 43 38 43 42 54 55 52 53 38 40 47 44 47 42

a Discuss an appropriate way to organise and display the data.


b What measures of cluster and spread would you use to describe the data?
c How effective have the council's review procedures been in reducing the approval time?

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


INVESTIGATION 14:05 THE AGEING POPULATION
The Australian Bureau of Statistics collects information on a wide variety of topics. This
information is used by governments to help for1nulate social policy.

The gradual ageing of Australia's population has caused a rethinking of the government's policy
towards pensions, superannuation and caring for the aged.
POPULATION STRUCTURE, by Age and Sex - 1987 and 2007

Age (years)
c:a 85+ :::1 ==:::,==
1

I I 80-84 1[ - - - -,--.I
1 75-79 1--
f - - - ,. -
1

c.,....__ ___.I 70-74 f 1


1
2007
I 1 65-69 r ., ------..I 198 7
I
I 60-64 if - - - - - - - ,. . ..1.
t I
I 55-59 I
t I
I 50-54 I
I I
I 45-49 I
I • ,,
I 40-44 I
I I
I 35-39 I
I I
I 30-34 I
I I
I 25-29 I
I
I 20-24 1
I
I i
I 15-19 I
i I
I 10-14 I
I I
I 5-9 I
I 1
0-4
Io

I I

5 4 3 2 1 0 0 1 2 3 4 5
Males(%) Females(%)

Where your answers are percentages, give them correct to one decimal place.

1 In 1987, what percentage of Australians were males aged:


a 10-14 b 40-44 c 70-74?

2 In 2007, what percentage of Australians were males aged:


a 10-14 b 40-44 c 70-74?

3 What do your answers to Questions 1 and 2 suggest about the changes in the male
population over the 20 years from 1987 to 2007? Does this trend also occur in the female
population?
4 a What percentage of Australians were aged 70 and over in:
i 1987 ii 2007?
b If there were 1 920 200 people aged over 70 in 2007, calculate the total population of
Australia in that year. Give your answer to the nearest thousand.

5 The approxini_ate number of Australian females aged 65- 69 was 300 100 in 1987 and
401200 in 2007 and yet, looking at the diagram, the percentage of females in the 65-69 age
group was relatively similar in 1987 and 2007. How is this possible?

1' Statistics
MATHS TERMS 14
box-and-whisker plot (box plot) frequency
• a diagram obtained from the five-point • the number of times an outcome occurs
summary in the data,
• the box shows the middle 50% of scores e.g. for the data 3, 6, 5, 3, 5, 5, 4, 3, 3, 6 the
(the interquartile range) outcome 5 has a frequency of 3
• the whiskers show us the extent of the frequency distribution table
bottom and top quartiles as well as • a table that shows all the possible
the range outcomes and their frequencies (it usually
is extended by adding other columns such
as the cumulative frequency),
I
4 6 8 10 12 14 Outcome Frequency Cumulative
e.g.
class centre frequency
• the middle outcome of a class,
3 4 4
e. g. the class 1-5 has a class centre of 3
class interval 4 1 5
• the size of the groups into which the data 5 3 8
is organised,
e.g. 1-5 (5 scores); 11-20 (10 scores) 6 2 10
cumulative frequency histogram frequency histogram
(and polygon) • a type of column graph showing the
• these show the 10 outcomes and their frequencies,
c
outcomes and 8 e.g. I

their cumulative g
._
lo...
6
>-.
4 I-

::i 4 ()
c 3 I-

frequencies
§
()
2 CJ
2 I-

LL
1 I-

3 4 5 6 0
I I I I

Outcome 3 4 5 6
Outcome
dot plot
• a graph that uses one axis and a number frequency polygon
of dots above the axis • a line graph formed by joining the
five-point summary midpoints of the top of each column;
• a set of numbers consisting of the to complete the polygon the outcomes
minimum score, the three quartiles and immediately above and below those
the maximum score present are used (the heights of these
columns is zero)
4
>-.
g 3
2
CJ
1
LL o ....._.__..___.'----L-__.___._...__._I--..
3 4 5 6
Outcome

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3



grouped data ogive
• the organisation of data into groups • this is another nan1.e for the cumulative
or classes frequency polygon
interquartile range outcome
• IQR = Q3 - Qt • a possible value of the data
• the range of the middle 50% of scores outlier
• the median of the upper half of scores • a score that is separated from the main
minus the median of the lower half body of scores
of scores quartiles
mean • the points that divide the scores up into
• the number obtained by 'evening out' all quarters
the scores until they are equal, • the second quartile, Q2 , divides the scores
e.g. if the scores 3, 6, 5, 3, 5, 5, 4, 3, 3, 6 into halves ( Q2 = median)
were 'evened out', the number • the first quartile, Q 1, is the median of the
obtained would be 4·3 lower half of scores
• to obtain the mean, use the formula: • the third quartile, Q3 , is the n1.edian of the
sum of the scores upper half of scores
Mean ==
total number of scores 4 5 6 6 7 7 I 7 9 9 11 12 15
median Q1 = 6 Q2 = 7 Q3 = 10
• the middle score for an odd number of
range
scores, or the mean of the n1.iddle two
• the clifference between the highest and
scores for an even number of scores
lowest scores
median class
shape (of a distribution)
• in grouped data, the class that contains
• a set of scores can be symmetric or
the median
skewed (see pages 441 and 442)
mode (modal class)
statistics
• the outcome or class that contains the
• the collection, organisation and
most scores
interpretation of numerical data
stem-and-leaf plot
• a graph that shows the spread of scores
without losing the identity of the data

Spreadsheets can be used to


analyse data and display
information. Statistica I formulas
can be entered in cells to add data
or to calculate the mean.

1' Statistics
STATISTICS
These questions reflect the important skills introduced in this chapter.
Errors made will indicate areas of weakness.
Each weakness should be treated by going back to the section listed.

1 The students of class 9M were given a reading test and rated from 14:01,
0 (a poor reader) to 5 (an excellent reader). 14:02
The results are given below.

4 1 0 2 3 3 3 2 2 1
Outcome Tally f cf
0 2 2 4 3 5 3 2 1 3
(x)
2 0 3 1 3 4 5 1 0 2
a Complete this frequency distribution 0
table. 1
b What is the frequency of 5?
c How many students were given a 2
rating less than 4? 3
d On the same diagram, draw the
4
frequency histogram and the
frequency polygon. 5
e On the same diagram draw the Total:
cumulative frequency histogram and
the cumulative frequency polygon.
f What is the range of these scores?
g Find the mode, median and mean for these scores.

2 Use your calculator to evaluate the mean for the scores in the following frequency 14:01
tables. Give your answer correct to two decimal places.
a Outcome Freq. b Outcome 4· 1 4·2 4.3 4·4 4.5 4·6 4.7
27 18 Freq. 7 11 16 8 12 7 3
28 50
29 23

30 9

3 These are the scores gained by each team co1npeting in the Lithgow car rally 14:01,
this year. 14:02
27 18 0 45 63 49 50 31 9 26
4 41 38 20 69 38 17 43 16 37
28 14 58 52 37 43 38 51 44 33
25 38 11 43 40 56 62 48 53 22
a Draw a grouped frequency table using classes 0-9, 10-19 etc. Use the columns:
class, class centre, tally, frequency and cumulative frequency.
b Prepare a stem-and-leaf plot for the scores above.

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


4 Find the median for each of the following.
a Outcome (x) cf b Outcome (x) cf c Outcome(x) cf
4 3 11 7 1 24

5 7 12 15 2 37
I

6 10 13 33 3 44
I

7 16 14 53 4 47
I

8 23 15 62 5 so

5 Use the following graphs to calculate the median for each set of data. 14:02

a 40 b 20 c 40
>. >. >.
() u u
c c c
30 15 (1)
:::J
30
O"' O"' O"'
(1) (1) (1)
...... ...... ......
20 (1)
> 10 (1)
>
20
·-
...... ·-
...... ·-
......
C\S
-:::JC\S -:::J -C\S:::J
E 10 E 5 E 10
:::J :::J :::J
() () ()

0
3 4 5 6 7 8 5 6 7 8 9 11 12 13 14 15
Outcome Outcome Outcome

6 a Find the interquartile range of the scores: 14:03


1, 2, 2, 5, 7, 9, 10, 10, 11, 11, 11, 11
b Draw a cuni_ulative frequency polygon using the frequency distribution table
below and use it to find the interquartile range of the scores.
x 10 11 12 13 14 15 16 17 18 19

f 2 0 5 4 5 6 5 6 3 4
c The lengths of 16 fish caught were measured. The results are shown on this dot
plot. What is the interquartile range?

• •
• • • • •
• • • • • • • •
20 22 24 26 28 30 32 34
Length of fish (cm)

d What is the interquartile range of the times shown


Time
in the stem-and-leaf plot?
Stem Leaf
3 88
4 03458
5 134 4
6 0

1' Statistics
7 Find the five-point summary for each set of data in Question 6. 14:04
8 Draw box plots for the data in Question 6 a, b and c. 14:04
9 These double box plots were drawn to compare the results of Year 10 in two tests. 14:05
Q1 Q2 Q3
- - - - - - 1 . . . __ _ _ ___.___ _ ( Test 2 I
ITest 1 ) - - - - - - - - 1 , . _ _ _ __ _ . __ _ __ _ _ _ _ . 1 - - - - - - - - - -

30 40 50 60 70 80 90
Score

a By how much was the median for Test 2 higher than the median of Test 1?
b What was the range and interquartile range of Test 1?
10 Test 1 scores Test 2 scores
12 17 19 12 15 10 21 15 18 7 11 16
9 22 24 11 18 8 20 12 23 12 10 13
25 15 18 20 18 18 12 19 12 14 20 9
a Draw a dot plot for the scores on Test 1.
b Draw a back-to-back stem-and-leaf plot to co1npare the scores on Test 1 and
Test 2 .
c Draw double box plots to compare the scores on Tests 1 and 2.

This building, nicknamed the Gherkin, is the first


skyscraper in London built using a sustainable
design. Sensors on the building collect data that
is used to control the lighting, ventilation and
heating, which reduces energy usage.

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


ASSIGNMENT 14A hapter review
1 The average length of the •

two index fingers of • • •
• • • • • • •
• • • • • • • • • • • •
24 teachers was recorded. .. I I ..

62 64 66 68 70 72 74 76 78 80 82
The results are shown on Length of index finger (mm)
this dot plot.
a Are any outliers present in this data?
b Find the five-point summary for the data if the outlier is:
i included ii omitted.
c What is the interquartile range if the outlier is:
i included ii omitted?
d Comment on the shape of the distribution (ignore the outlier in this case).

2 A group of Year 9 students sat for a college reading test and


Test results, Year 9
the results were entered in this stem-and-leaf plot.
(7 3 represents 73)
a How many scored in the 50s?
b What was the mode of the results? Stem Leaf
c How many students scored less than 50? 3 182
d How many students were tested? 4 63313
e What was the median of the results? 5 4 115
f If 65 and above is considered to be a passing grade, how 6 2284771
many passed? 7 63465514
g Make a grouped frequency table for this data using 8 2667331
classes of 30-39, 40-49, ... 9 299100

3 Year 3 and Year 4 students were tested on their


Test scores ( 5 1 represents 51)
knowledge of multiplication tables. The results are
shown in this back-to-back stem-and-leaf plot. Year 3 results Stem Year 4 results

Compare the results ofYear 3 andYear 4 on this 9830 3 5


test. (You will need to refer to at least one measure 8874222 4
of cluster and one measure of spread.) 87740 5 14669
763 6 02258
1 7 3688999
8 00

4 a For the data shown in •


'

this histogram, determine the:


.....

I range 5
••
11 median >-
(.)
-
•••
c
111 interquartile range 3 .....
O"
• Q)
IV five-point summary. "-
LL. -
b Use this information to construct a
1 ..... •
box plot.
I I I I I I I I I I I
-
0 2 4 6 8 10
Score

1' Statistics
5 After the Year 8 semester exam the inaths Class Class f c .f. f x c.c.
staff organised the 99 marks into a grouped centre [c.c.)
frequency distribution. The results are
10-19 2
shown in the table.
20-29 9
a Copy and complete the grouped
30-39 10
frequency distribution table.
Use it to find: 40-49 8
i the modal class ii the mean. 50-59 16
b Construct an ogive and use it to find 60-69 20
the median class. 70-79 13
80-89 14
90-99 7

ASSIGNMENT 148 orkin mathematically


1 Peter is now twice as old as Paul was when Hole Par Golfer's scores
Peter was as old as Paul is now. How old 2 3 4 5 6 7
is Peter if their ages add to 91? 2 4 1 51 310 9 -
85
2 Find the perimeter and area of this figure. 7 5 - 20 211 198 23 4
I4 5·4 cm .. I 11 3 25 269 156 4 - -
I
I

:3·6 cm :
I
16 3 50 292 99 13 - -
I
I

I • .. I
4cm 6 Choose the heading from the list below
3 When this net is folded to 5 that would best fit each graph.
form a cube, the nuni_bers on 7 3 4 2 a b
5 ....... .......
the three faces that meet at ..c
Q') ..c
·-
<1>
Q')
·-
each vertex are multiplied together. I I
<1>

What is the smallest product possible? Time Time


4 In a 360-minute tennis match, 4 players c d
were always on the court. There were .......
..c
Q')
12 players (6 froni_ each team), and they ·-
<1>
I
were each on the court for the same
Time
amount of time. How many minutes did
each player spend on the court?
A The hook of a fishing line while fishing.
5 The scores of all players in a recent golf B An arrow fired into the air.
tournament are recorded in the fallowing C Flying a kite.
table. Explain what calculations you would D Position of my head while pole vaulting.
use to rate the holes and list them in order E A parachute jump.
of degree of difficulty (1 the hardest to 4 F Position of my foot as I kick a ball.
the easiest). In golf, 'par' is the number of
strokes that are allocated to complete a
hole and a lower score is better than a
higher score.

Australian Signpost Mathematics New South Wales 9 Stages 5.1-5.3


ASSIGNMENT 14C umulative revision
1 a Find the value of x, correct to one decimal place. 13:05
b The two short sides of a right-angled triangle are 5 m and 40cm
8 m in length. Find the sizes of the angles
in the triangle to the nearest degree.
xcm

2 On a photocopier the enlargement and reduction factors are given as percentages. 11 :01
Find the enlargement and reduction factor for the fallowing.
a Enlarge a photograph that is 18 cm long and 13 cm wide so that it is 28·8 cm
long and 20·8 cm wide.
b Reduce a drawing that is a square of side 16 cm to a square of side 12 cm.
3 a Find the gradient and y-intercept of the 9:05,
line AB and use it to write its equation. 9:06,
b Find the equation of the line DC and use 10:02A
it to find the x-intercept of the line. 2
c Use simultaneous equations to find the
B
point of intersection of line AB with
-4 -2 0 4 x
the line 2x + y = 4.
-2

4 Factorise these expressions: 12:06


2 2 2
a x + 3x - 10 b a - a - 56 c 4y - 9
2 2
d x - 3x + ax - 3a e 15n - n - 2
5 Simplify these expressions involving algebraic fractions. 12:07,
2
a x -2x-3
2 2
b x -9 x x +7x+10 5 4 12:08
2 2
c -2- - - -2- - - -
x - 5x + 6 x +5 x + 5x + 6 x -4 x + 2x-8
6 Bag A contains 32 red and 48 green marbles. Bag B contains 18 red and 4:03
26 green marbles.
a Calculate the probability of choosing a red marble from Bag A.
b What is the probability of choosing a red marble from Bag B?
c Which bag gives the best chance of choosing a red marble?

7 At the end of the financial year Hannah's total income is $71325. She has allowable 8:04
tax deductions of $3327 and throughout the year she has made PAYG deductions
totalling $14 924. Calculate:
a her taxable incon1e
b the tax payable on her taxable income if she has to pay $4650 plus 30c for each
dollar over $3 7 OOO
c the Medicare levy, which is 1·5% of her taxable income
d the refund due or the amount of tax still payable.

1' Statistics

You might also like