CH 14 Statistics
CH 14 Statistics
Contents
14:01 Review of statistics 14:04 Box plots
Investigation 14:01 Adding and averaging Investigation 14:04 Code breaking and statistics
14:02 Cumulative frequency 14:05 Comparing sets of data
14:03 Measures of spread: Interquartile range Investigation 14:05 The ageing population
Fun spot 14:03 Why did the robber flee from the Maths terms, Diagnostic test, Assignments
music store?
-----------------------------------------------------------------------------------------------
Syllabus references (See pages x-xv for details.)
Statistics and Probability
Selections from Single Variable DataAnal1,sis [Stages 5.1, 5.2°]
• Construct back-to-back stem-and-leaf plots and histograms and describe data, using terms including 'skewed',
'symmetric' and 'bi-modal' (ACMSP282)
• Compare data displays using m ean, median and range to describe and interpret numerical data sets in terms of
location (centre) and spread (ACMSP283)
• Determine quartiles and interquartile range (ACMSP248)
• Construct and interpret box plots and use them to co1npare data sets (ACMSP249)
• Compare shapes of box plots to corresponding histograms and dot plots (ACMSP250)
-----------------------------------------------------------------------------------------------
Working Mathematically
• Communicating • Problem Solving • Reasoning • Understanding • Fluency
• • •
In Years 7 and 8 the work in statistics concentrated on the classification, collection, organisation
and analysis of data.
WORKED EXAMPLE 1
Find the range, mode, median and mean of each set of scores.
a 4 4 4 12 9 6 10 b 15 36 40 23 18 46 21 28 32 36
Solutions
a Range = highest score - lowest score b Range = highest score - lowest score
= 12- 4 = 46 - 15
=8 = 31
Mode = score occurring most Mode = score occurring most
=4 = 36
Median = middle score Median = average of two middle scores
=6 28+32
-
sum of the scores 2
Mean=--------- = 30
total number of scores
295
4+4+4+12+9+6+10 Mean=--
7 10
=7 = 29·5
Solution
• The mode is the score that occurs the
Mark Tally Frequency fx
most. Here, the mode is 6.
(x) lf)
• The median is the middle score when
3 - 5 15
- the scores are arranged in order. As there
4 I 3 12 are 29 scores, the 15th score will be the
middle score. Counting down the
5 - - I 7 35
frequency column, it can be seen that the
6 - - II 8 48 15th score is a 5. Hence, the median is 5.
su1n ofJx column 'Lfx
7 I I 4 28 • T h e mean=
sum ofJ column
=-
'Lf
8 2 16 154
Total: 29 154 29
= 5·3 (1 dee. pl.)
WORKED EXAMPLE 3
Class quiz
Draw a frequency histogram and polygon for the data in
Worked Example 2.
Solution
• The frequency histogram is a column graph. >. 6
()
1' Statistics
WORKED EXAMPLE 4
The scores for a group of 45 students on a spelling test out of 60 were:
50 41 34 25 18 8 35 45 54 14 59 28 39 42 53
34 51 38 47 21 50 9 54 57 46 10 48 34 11 40
52 8 23 42 52 46 37 27 55 17 32 41 30 25 11
a Sort this set of data into a grouped frequency distribution table with groupings of 0-9,
10-19 etc.
b Find the modal class, median class and an estimate for the mean using the class centres for
each group.
c Construct a stem-and-leaf plot and use this to find the mode, median and mean. Compare
these measures with those for the grouped data.
Solutions
a Class Class centre (c.c.) Tally Frequency lf) f x c.c.
0-9 4.5 III 3 13·5
10-19 14·5 - 6 87
-
20-29 24·5 -
- 6 147
30-39 34·5 - - II I 9 310·5
40-49 44·5 - -- - 10 445
50-59 54·5 - -
- - I 11 599·5
Total: "Lf = 45 "L(fx c.c.) = 1602·5
In particular, the median and mean indicated significant overall improvement in the scores.
This can also be seen by comparing the distribution of the scores in the back-to-back plot.
1' Statistics
D Determine the: i range, ii mode, iii median and iv mean for each set of scores.
a 5,9,2,7,5,8,4 b 5,8,5,7,8,5,9,7
c 21,24, 19,25,24 d 1·3 1·5 1·1 1·5 1·6 1·4 1·7 1·9
' ' ' ' ' ' '
fl Use your calculator to evaluate x for each set of scores.
a 6,9,7,8,5 b 61,47,56,87,91 c 8, 8, 8, 8, 8, 8
4,9,6,5,4 44,59,65,77,73 6,6,6,6,6,6
3,8,8,5,7 49,39,82,60,51 7,7, 7,7,7,7
6,5,7,5,4 84,73,67,65,55 9,9,9,9,3,3
d Outcome Frequency e Outcome Frequency f Outcome Frequency
1 6 48 6 12 2
2 9 49 11 13 15
3 11 50 27 14 43
4 7 51 15 15 67
5 2 52 8 16 27
53 3 17 8
3 3
2
4 1
5 0 1 2 3 4 5 6 7
Number of children
6
• •
•• •• •• •• ••
• • • •
16 18 20 22 24 26 28 30
Score
It'>
b Time taken (minutes) to complete a bicycle race: cycle-_.
logical!
Stem Leaf """
3 8
4 269
5 3367
6 0133478
7 46689
8 3489
1' Statistics
II Each golfer in a tournament completed three
Class Class Frequency f x c.c.
rounds of golf. The scores for each round
centre (f)
were tabulated in this grouped frequency
(c.c.)
distribution table.
a Copy and complete this table. 64-66 65 25
b How many golfers competed in this 67-69 74
tournament?
c What was the maximum possible range? 70-72 73
d What is the modal class? 73-75 24
e Use the class centres to find an
76-78 2
approximation for the mean.
f Determine the median class. Total: • • • • • •
IE Year 9's exam results have been organised into an ordered stern-and-leaf plot using a class size
of 5 as shown.
a Complete the frequency column and use it to determine the modal class.
b Complete the 'class centre'
Stem Leaf f c.c. f x c.c.
and 'frequency times class - ,_
6 (5) 88999 67
centre' columns. Use the totals
7(0) 000012234
to calculate an approximation
7(5) 555566777788
for the mean.
8(0) 00011112344
c Determine the actual median
8(5) 58
for these results using the leaf
9(0) 00012334
column.
9(5) 789
d Determine the median class
using the frequency column. Totals: • •
• • • •
1' Statistics
INVESTIGATION 14:01 ADDING AND AVERAGING
810 ...... - =SUM 82:88
A B c D E F G H I J K L M N 0
1 DAY TAKINGS DAY TAKINGS DAY TAKINGS DAY TAKINGS DAY TAKINGS
2 27-0ct $2,490 3-Nov $3,260 10-Nov $2,800 17-Nov $2,570 24-Nov $2, 170
3 28-0ct $4,360 4-Nov $4,040 11-Nov $4,690 18-Nov $2,920 25-Nov $3,640
4 29-0ct $1,440 5-Nov $1,420 12-Nov $1,520 19-Nov $2,360 26-Nov $1,420
5 30-0ct $1,660 6-Nov $1,960 13-Nov $1,340 20-Nov $1, 100 27-Nov $1,350
6 31-0ct $1,370 7-Nov $1, 180 14-Nov $1,900 21-Nov $1 , 170 28-Nov $1,480
7 1-Nov $1,430 8-Nov $1,230 15-Nov $1,440 22-Nov $1, 700 29-Nov $1,350
8 2-Nov $1,860 9-Nov $1,510 16-Nov $1,660 23-Nov $1 ,550 30-Nov $1,440
9
10 $14,610
11
12
Part of a spreadsheet is shown above. It shows the daily takings for the Lazy Lizard Cafe for the
5-week period from 27 October until 30 November.
• Enter this information into a spreadsheet.
Note the formula ==SUM(B2:B8) has been entered in cell B10.This gives the weekly takings by
adding the numbers in cells B2 to B8.
1 Write formulas to give the weekly takings for the other 4 weeks.
2 Now write a formula that could give you the total takings for the 5-week period.
Each row of the spreadsheet gives the takings for the same day of the week. For example, tl1e
days are all Saturdays in row 2 and all Sundays in row 3.
WORKED EXAMPLE
Outcome Tally Frequency Cumulative For a class of 26 students the foil owing
(x) lf) frequency (cf) marks out of 10 were obtained in a test.
• 15 students scored 6 or less. Since
3 II 3 3
4 students scored 7, the cun1-ulative
4 3 +----- 6 frequency of7 is 15 + 4 or 19.
5 4
• The last figure in the cf column must
+----- 10
be equal to the sum of the frequencies,
6 -
- 5 +----- 15 as all students are on or below the
7 4 19 highest outcome.
8 2 21
9 II 3 24
10 2 26
Total: 26 26
8
How abovt
6
tl-iat!
4
1' Statistics
Finding the median from a frequency distribution table
The cumulative frequency can be used to find the median of a set of scores.
WORKED EXAMPLES
Outcome Frequency Cumulative The middle score is the 15th score
1
(14 above it and 14 below it).
[x] lf) frequency
The 15th score is a 5.
3 5 5 .·. Median= 5
4 3 8 /
5 7 15
6 8 23
7 4 27
8 2 29
9 5 21
10 1 22
10 3 30
WORKED EXAMPLE 1 26
24
To find the median, follow these steps.
22
• Find the half-way point(! X 26 = 13).
20
• Draw a horizontal line from this point to the ogive. >. 18
u
• Then draw a vertical line to meet the horizontal axis. c
Q) 16
::J
• This meets the horizontal axis within the '6' column. a-
Q) 14
lo...
'+-
:. The median is 6. Q)
>
12 I
I
·-
+-' I
co
- 10 I
::J I
E 8 I
I
::J
() I
6 I
4 '
I
I
I
2 I
I
0
3 4 5 6 7 8 9 10
Outcome
For grouped data, the outcomes are grouped into classes with a representative class centre (c.c.).
The horizontal axis usually shows these class centres and the median class is found as above.
Alternatively, an estimate for a median could be read from the horizontal axis, as shown below.
WORKED EXAMPLE 2
The percentage results for sixty students in an examination are given in this table.
1' Statistics
When constructing frequency diagrams for grouped
60
data, the only point to note is that the columns are 56
indicated on the horizontal axis by the class centres. 52
The diagram for the worked example above would 48
look like this. 44
>.
• The cumulative frequency polygon can be drawn g
Q)
40
I like to accumulate!
The cumulative frequency of an outcome gives
the nun-iber of outcon-ies equal to, or less than, BANK 8ooK
D Calculate the total of the frequency column and complete the cumulative frequency column
in each of these tables.
1 8 11 10 13
2 11 22 11 22
3 17 12 30
4 9 13 21
5 2 14 13
Total: Total:
3 8 30 9
4 14 35 7
5 20 40 10
6 31 45 15
7 32 50 8
8 28 55 10
9 11 60 2
10 5 65 4
Total: Total:
5 5-12
Q)
3.--
10
6 >
·-
...... 8
-::::i
C'CS
6
7 E
::::i
0 4
8 2
0
9 5 6 7 8 9 10
Score
10
Total: I.f = I.Jx=
From the table, determine the:
To find the;
a mode
mode us-e the
b mean
f column
c median
median us-e the
d range c.f: column
mean us-e the f,x
and f columns-.
1' Statistics
El Five coins were tossed many times and the number of
20
heads recorded. The cumulative frequency for each
number of heads was calculated and a cumulative 18
0 1 2 3 4 5
Number of heads
Sharon organises her family's football tipping competition. Each week Dad, Sharon, Adam and
Bron have to pick the results of the 7 rugby league matches played. The table below shows the
results for rounds 1 to 5.
Round 1 4 4 6 6 4 4 4
Round 2 3 7 3 9 4 3 7
Round 3 6 13 4 13 6 5 12
Round 4 4 17 4 17 2 3 15
Round 5 5 22 4 21 5 4 19
a 'Prog. Total' is short for 'Progressive Total'. It is like a cumulative frequency column.
Complete Dad's Prog. Total column.
b Who was leading the competition at the end of
i round 1 ii round 3 iii round 5?
c What has been the highest score achieved in a round? How many times has this happened?
d Who has had the lowest score in a round?
e In round 6 the scores were: Sharon 6,Adam 6, Dad 7, Bron 5. Use these results to add the
next line in the table.
Two dice were rolled one hundred times and the total showing on the two upper faces was
recorded to obtain this set of scores.
5 7 6 12 10 2 4 5 7 9 7 6 4 3 5 8 6 3 5 6
5 8 7 9 6 8 9 4 8 7 8 4 8 4 8 7 6 7 10 5
9 5 6 5 2 9 5 9 11 10 6 7 7 7 10 6 11 10 7 8
8 3 9 3 5 8 7 12 10 9 7 8 7 5 6 4 5 8 9 11
10 6 9 6 7 8 9 10 11 3 6 4 7 2 4 8 8 4 6 7
II In the game of golf, a par is the number of strokes allocated to complete a given hole. Holes
can only have a par of 3, 4 or 5 strokes. If a par is not scored, the score is said to be either
under or over par. Different holes can be rated for difficulty by analysing players' scores on
the hole.
The tables below show the scores achieved by all the players in a recent British Open on
two holes.
1' Statistics
The table below shows the players' scores for the second round in the same British Open golf
tournament. The par for the course (sum of the pars for all 18 holes) is 71.
Score 68 69 70 71 72 73 74 75 76 77 78 79
Frequency 1 0 6 9 13 16 8 8 8 3 0 1
Cumulative
frequency
Complete the cumulative freqt1ency column and use it (or some other m ethod) to answer the
following questions.
a How many players scored under par (lower than 71)?
b How 1nany players scored par or better (71 or lower) ?
c How many players scored worse that par (higher that 71)?
EJ Use the ogive to find the m edian from each graph.
a Mark out of 5 b Mark out of 14 c Mark out of 10
-
•
I
18 36 L 18
16 32 16
>. >.
g 14 I- (.) 28
12 -
[7 c
24
0- 0-
- 10
8
- 16 20
1
·-
+'""
6 I-
7 ·-
+...r
12
::J
E
::J
4 - / ::J
§ 8
0 2 1-- 0 4
I I I I I I
0
0 1 2 3 4 5 9 10 111213 14 4 5 6 7 8 9 10
Outcome Outcome Outcome
d Mark out of 8 e Mark out of 8 f Mark out of 8
27 22 27
24 20 >.
(.)
24
>. >. 18
(.) 21 t (.) a3 21
c a3 16 ::J
18 ::J g 18
0- 0-
Q)
14
-> 15
- 12 15
Q)
_+...r
12
;::: 10
Q)
·-
12
·-
+...r
CU ::J
-::JCU 9 t -::J 8 E 9
::J
E 6 E 6 0 6
::J ::J
0 0
3 4 3
0 2 0
3 4 5 6 7 8 4 5 6 7 8
0
Outcome 4 5 6 7 8 Outcome
Outcome
23-29 26 18 25
30-36 33 18 43
37-43 40 15 58
44-50 47 8 66
51-57 54 4 70
•
ID] From this frequency histogram: 14
,_
,_
a determine what the class groupings must have been if the 12
,_
8
b construct a cumulative frequency histogram and ogive g 6 ,_
la...
,_
2
I I I I I
0
10 19 28 37 46
Results
each class
G
c
24
21
CJ
(l)
c.c. 2 12 22 27 32 18
7 17 ,,.__
la...
(l)
> 15
·-
_.
-
f co 12
-::::J
E 9
::::J
c determine the modal class ()
6
d calculate the mean. 3
0
2 7 12 17 22 27 32
Outcome
1' Statistics
IE The exam results for Year 9 students have been collated in this grouped frequency distribution table.
-
'-
-
'-
I 17 603·5 60
41-50 45·5 -
i.-
-
'-
-
i.-
-
i.-
20 910 80
51-60 55·5 - ""'"
- ""'"
- ""'" 17 943·5 97
61-70 65·5 - ""'"
- ""'"
- ""'"
.....
""'"
I 21 1375·5 118
71-80 75.5 - - - 16 1208 134
- - -
81-90 85·5 -
i.-
-
""'"
-
i.-
II 18 1539 152
91- 100 95·5 -
i.-
II 8 764 160
Total: "Lf == 160 "L(jx c.c.) == 8110
a What is the greatest possible range for this data?
b What is the modal class?
c Calculate a value for the mean.
d Construct a cumulative frequency histogram and ogive.
e From your graph, determine the median class.
f Considering part e, what would be a reasonable single numerical value for the median?
g What percentage of students obtained: i more than 80 ii 20 or less iii more than 50?
Weather bureaus around the world keep statistics on many aspects of weather.
Average rainfall and average temperature are often quoted in weather broadcasts.
• So far we have concentrated on finding the measures of central tendency: mode, mean and
median. These values tell us how the scores tend to cluster.
• We have used the range as a measure of spread. But, as seen in the Prep quiz above, the range is
easily affected by an outlier.
•A much better measure of spread than the range is the interquartile range (IQR). This is the
range of the middle 50% of scores.
WORKED EXAMPLE 1 ••
Find the interquartile range of the scores: •
• • • • •
••••
1,2,2,5,7,9, 10, 10, 11, 11, 11, 11 1 2 3 4 5 6 7 8 9 10 11 12
Score
Method 1
• Make sure that the scores are in ascending order.
• Divide the scores into four equal groups. (This is not always possible. See Worked Example 2.)
The lst quartile (Q1) is 3·5, which lies half-way between 2 and 5.
The 2nd quartile (inedian) is 9·5, which lies half-way between 9 and 10.
The 3rd quartile ( Q3) is 11.
• The interquartile range is the difference between the 3rd and 1st quartiles.
Interquartile range == Q3 - Qi
== 11 - 3· 5
== 7.5
1' Statistics
Method 2
• Construct a cumulative frequency polygon.
• Co1ne across from the vertical axis to the polygon from positions representing 25%, 50% and
75% of the scores. Take the readings on the horizontal axis to obtain the 1st quartile, median
and 3rd quartile.
x f cf 12 Polygon1 I..
1 1 1
2 2 3 ____________________________ .,.,
9 of 12 ..
4
3 0 3
4 0 3
5 1 4 6 is ; of 12 .. (].) 6
·->
.........
6 0 4 '.:j
E
7 1 5 3 is of 12 .. c3 3
8 0 5
9 1 6
10 2 8
11 4 12 Score
'Lf == 12
The lst quartile ( Q 1) is somewhere in the range 2· 5 to 4· 5, Quartile i> >imilar to tlie
as the polygon has the height 3 for those values. We resolve quarter. Quartile>
this problem by taking the average of 2·5 and 4·5. divide tlie data il'\to four
equal 9roup>.
2·5 + 4·5
:. Q1 = = - - -
2
== 3·5
The interquartile range is more useful when the number of scores is large. When the
number of scores is small (e.g. ? ) , it is hard to define 'the middle half of the scores'.
Solution
When the number of scores in a set is not a multiple of 4,
they cannot be divided into 4 equal groups.
Set A has 11 scores. Hence, the 1, 2' @ 5' 7' @ 10' 10' @, 11, 11
middle score, 9, is the median ( Q2).
The middle score of the bottom 1st quartile Median 3rd quartile
5 scores is Q 1 . Ql == 2 Q2 == 9 Q3 == 11
The middle score of the top The interquartile range == Q3 - Ql
5 scores is Q 3 . ==11-2
== 9
Use Method 1 (Worked Example 1 on page 431) to find the interquartile range of each set of
scores. (Rewrite the scores in order as the first step in each case.)
a 6,4,3,8,5,4,2,7
b 1,5,2,6,3,8,7,5,4,5,7,9
c 60,84,79,83,94,88,92,99,80,90,95,78
d 15,43,30,22,41,30,27,25,28,20, 19,22,25,24,33,31,41,40,49,37
e 56,83,60,72,61,52,73,24,88,70,57,63,60,48,36,53,65,49,62,65
1' Statistics
fl The scores of 32 students have been used to graph 32
. '
..... _/".
Cumulatiwe
this cumulative frequency histogram and polygon. frequency
Use the graph to find: ..... - histogram
/
28
and polygon
a the median, Q2 -- .
b the 1st quartile, Q 1 ... - - -
0 1 2 3 4 5 6 7 8 9 10
Score
16 3 35 3
17 4 36 4
18 5 37 7
19 5 38 10
20 3 39 18
40 18
Use the cumulative frequency polygons to find the interquartile range of each set of scores.
a 30 b 30 - I
-
-
24 24 -
>. 22·5 >. - '
() ()
c c - ..
<D <D
::J ::J
0- 18 0- 18 -
<D <D
'+- '+- -
<D 15 ..--.. <D
>
>
·- ·-....... -
('\j
-::J 12 "S 12 -
E E '7
::J ::J - - '
0 0
7·5 -
6 6-
- ·-
-7. I I I I I I I
1112131415161718 1 2 3 4 5 6 7 8
Score Score
b 3rd quartile
'+-
·- 60 I
c interquartile range.
.......
('\j
-::J I -
E
::J 40
[
0
-
20
I-
I I I I I I I I
0 52 57 62 67 72 77 82 87
Weight (kg)
1' Statistics
D Find the quartiles for each of the fallowing sets of data and then find the interquartile range.
(Note that in both the dot plot and the stem-and-leaf plot, the scores have already been
arranged in order.)
a • •
•• •• •• •• •• b
Stem Leaf
,.
•
I
• I
• •
16 18 20 22 24 26 28 30 3 8
Score 4 269
5 3367
6 0133478
7 46689
8 3489
In Stage 4, the dot plot and stem-and-leaf plot were used to illustrate certain aspects of a set of
scores or distribution.
Another type of display is the box-and-whisker plot, or more simply, box plot. This is drawn using
a five-point summary of the data as shown below.
3 The median, Q2
®
4 The third quartile, Q3
5 The maximum score @ ®
In a box plot:
• the box shows the middle 50% (the interquartile range) between Q 1 and Q3
• the whiskers extend from the box to the highest and lowest scores
• the whiskers show the range of the scores.
WOR·'KED EXAMPLE 1
• �- - •. • : - •
•
_. • ' •• --. • • •• • •" "• "• ·, • C > '- 0 '•• • ' • •
Solution
Rearrange the scores in order and find Q2 , then Q 1 and Q3 .
20 26 40 43 43 57 63 63 63 66 71 74 74 83 87 89
t t t
The five-point summary is (20, 43, 63, 74, 89).
Use the five-point summary and a suitable scale (1 mark := 1 mm) to construct the box-and
whisker diagram or box plot.
20 30 40 50 60 70 80 90
Score
14 Statistics
Use the box plot to find th,e:
a range
b interquartile range
-
c median
d percentage of scores above 60
.. tt r -1
T
t I - t -, t µ b
' �
1- -1 i I II
20 30, 40 50 601 70 80
e percentage of scores below 36. Score
Solution
a Range = maximum score - minimum score b Interquartile range = Q3 - Q 1
= 74 - 25 = 60 - 36
= 49' = 24
c Median == 54 d As Q3 == 60, 25% of the scores are above 60.
e As Q 1 = 36, 25% of the scores are below 36.
a -
-- -
' f- I t tt tI
tI l
I•
�- >- ,_
b
'
'
I- - .
-
-
t f1- - ''
''
C '
'' ,-
·- ·-
·- '
... I l I I I I
II
I I
j
I
I I
·-
I
,.
----
I '•
40 50 60 70 80
Score
Find the five-point summary for each of the following sets of data and use it to construct a
box plot.
a 7,7,8,8,8,9,9,9, 10, 12, 12, 12
b 16,24,25,25,26,28,28,28,28,30,32,33,34,34,37,38
C 14,19,29,36,40,43,43,44,46,46,47,49
------- Year 1 10
Year9
11 I I I I I I I I I I •
2 4 6 8
10 12 14 16
Distance travelled (km)
Ray and Ken play 40 games of golf over a 1-year period. Their scores are shown on the
double box plots below.
t--------- Ray
Ken
11 I I I I I II I -
72 74 76 78 80 82 84 86 88
Score
a What is the five-point summary for Ken's scores?
b Which golfer's scores have the smaller range?
c Which golfer's scores have the smaller interquartile range?
d Given your answers to b and c, which golfer do you think is the n1-ost consistent?
Give a reason for your answer.
14 Statistics
El Rick recorded how long it took him to drive to work over 28 consecutive days. The times
taken to the nearest minute are shown in the frequency table.
Time (minutes) 38 39 40 41 42 43 44 52
Frequency 1 2 6 7 5 4 2 1
One year later, after the addition of traffic lights and other traffic ni_anagement measures,
Rick repeated the process and obtained the following results.
Time (minutes) 38 39 40 41 42 43 45
Frequency 1 4 8 9 4 1 1
Draw double box plots to illustrate the before and after results and use them to corrunent on
the effectiveness of the traffic changes.
1 Write what you think the answers are to the three questions above.
2 Use the statements above to do an alphabetic analysis. Were your answers in 1 supported by
the statistics?
-- ...
•••
As well as deciphering codes, mathematicians are often employed to devise security codes to prevent
access by unauthorised users. In particular, cryptographers are employed to stop computer hackers
from accessing computer records.
On one side we have mathematicians trying to break codes, and on the other side we have
mathematicians trying to design codes that cannot be broken.
Statistics are often used to look at the similarities and differences between sets of data. Here are
some examples.
• Teachers are often interested in comparing the marks of a class on different topics or
comparing the marks of clifferent classes on the same topic.
• Medical researchers could compare the heart rates of different groups of people after exercise.
• Coaches might compare the performances of different players over a season or the same player
over different seasons.
• Managing directors of companies could compare sales and profits over different periods.
As well as calculating the measures of cluster (the mean, median and mode) and the measures of
spread (the range and interquartile range), a comparison would usually involve using graphical
methods. Back-to-back stem-and-leaf plots, double-column graphs, double box plots and
histograms are useful ways of comparing sets of data.
Shape of a distribution
A significant feature of a set of data is its shape. This is most easily seen using a histogram or stem-
and-leaf plot. For some data sets with many scores and a large range, the graph is often shown as
a curve.
The graphs below show the results of 120 students on four different problem-solving tests.
Graph A Graph B
30 30
>. >.
g
(1)
20 g
(1)
20
:::J :::J
O" O"
LL 10 LL 10
4 5 6 7 8 9 10 x 4 5 6 7 8 9 10 x
Score Score
1' Statistics
Graph C Graph D
30 30
>. >.
g
Q)
20 g 20
Q)
::J ::J
O' O'
Q)
lo...
LL 10 LL 10
4 5 6 7 8 9 10 x 4 5 6 7 8 9 10 x
Score Score
WORKED EXAMPLE
Our class was given a topic test in which we
Test scores
performed poorly. Our teacher decided to
( 4 1 represents 41)
give a similar test one week later, after a
thorough revision of the topic. The results Test 1 Stem Test 2
are shown on this back-to-back stem-and-leaf 98660 3
plot. (This is an ordered display.) 9773111 4 36688
Compare the results of the class on the two 885330 5 17 99
tests. Note that two students were absent 98753 6 389
during Test 1. 7 055589
8 2677
0 9 0013
Solution
• The improvement in the second test is clear to see.
Test 1 Test 2
The medians, which are easily found, verify this, as
do the means. Median 49·5 72·5
the mean.
30 40 50 60 70 80 90 100
Test 2 is negatively skewed as more scores are at
the 'high end', indicated by the median being
greater than the 1nean.
J School A J School B
100 - 100-
80 -
80 -
60 - 60 -
40 - 40 -
20 - 20 -
I I I I I I I
I I I I I I I
12 13 14 15 16 17 18 x 12 13 14 15 16 17 18 x
Age Age
J • SchoolC J l
School D
200 - 100 -
160 - 80 ,_
120 ,_ 60 -
80 - 40 -
40 ,_ 20 -
I I I I I I I
-
I I I I I I I
--
12 13 14 15 16 17 18 x 12 13 14 15 16 17 18 x
Age Age
a Which schools' age distributions are skewed? What causes the skew?
b Which schools' age distribution is closest to being distributed evenly?
c In which school would the mean age of a student be:
i closest to 15 ii below 15 iii over 15 iv the largest?
The marks for two classes on the same test are shown in the dot plots below.
Class 1
•• •• •
•• • •• •• •• •• •• •• •
•••
82 84 86 88 90 92 94 96 98 100
Score
Class 2
•
•• •
•• •• •• • •• •• ••
• • • • •••
80 82 84 86 88 90 92 94 96 98 100
Score
1' Statistics
El A school librarian was interested in comparing the number of books borrowed by boys and
girls. At the end of the year, she looked at the number of books borrowed by each child and
prepared the following graphs.
J J l
Boys' borrowings
......
Girls' borrowings 50 ,_
50
40 -
40 I-
......
30 I-
30
20 I-
20 ......
10 ,_ 10 -
I I I I I
I I I I I
B The stem-and-leaf plot shows the marks of a class on two different topic tests.
a Which set of marks is nearly symmetric?
b Which set of marks has the smaller spread? What measures of spread can you use to
support your answer?
c Calculate the median and mean for each set of marks. What do they suggest about
the class performance on the two tests?
Class tests
c
10 -
swimmer (C), good swimmer (G) O"
8 -
or excellent swimmer (E). LL
6 ..... -
a What was the mode rating
4 ..... -
before the program?
2 -
I
II In two problem-solving tests, 5 questions were given to a class. The scores are shown below.
Problem test 1
5134341122533112432443 2
Problem test 2
0 2 4 4 2 2 3 0 0 2 4 4 1 3 3 2 2 3 2 2 0 2 2
a Arrange the scores into a frequency distribution table and use frequency histograms to
display the data.
b Calculate the mean and median for each test. What do they suggest about the difficulty of
the tests?
c Both sets of scores have a range of 4. Which set of scores has the greater spread?
Give a reason for your answer.
1' Statistics
This box plot represents the heights of 30 Year 10 students. The histogram also represents the
heights of 30 students.
I I 6
•
,__
G5 ,__ ,__
Brand X
t------ Brand Y
Before After
44 53 38 39 52 41 40 41 40 39 43 42 54 48 46 44
43 43 42 57 47 45 50 50 51 52 44 38 40 40 51 52
68 50 45 42 58 48 40 39 39 46 46 49 52 51 42 43
44 46 52 45 46 53 54 40 40 45 44 39 50 43 48 40
48 47 43 38 43 42 54 55 52 53 38 40 47 44 47 42
The gradual ageing of Australia's population has caused a rethinking of the government's policy
towards pensions, superannuation and caring for the aged.
POPULATION STRUCTURE, by Age and Sex - 1987 and 2007
Age (years)
c:a 85+ :::1 ==:::,==
1
I I 80-84 1[ - - - -,--.I
1 75-79 1--
f - - - ,. -
1
I I
5 4 3 2 1 0 0 1 2 3 4 5
Males(%) Females(%)
Where your answers are percentages, give them correct to one decimal place.
3 What do your answers to Questions 1 and 2 suggest about the changes in the male
population over the 20 years from 1987 to 2007? Does this trend also occur in the female
population?
4 a What percentage of Australians were aged 70 and over in:
i 1987 ii 2007?
b If there were 1 920 200 people aged over 70 in 2007, calculate the total population of
Australia in that year. Give your answer to the nearest thousand.
5 The approxini_ate number of Australian females aged 65- 69 was 300 100 in 1987 and
401200 in 2007 and yet, looking at the diagram, the percentage of females in the 65-69 age
group was relatively similar in 1987 and 2007. How is this possible?
1' Statistics
MATHS TERMS 14
box-and-whisker plot (box plot) frequency
• a diagram obtained from the five-point • the number of times an outcome occurs
summary in the data,
• the box shows the middle 50% of scores e.g. for the data 3, 6, 5, 3, 5, 5, 4, 3, 3, 6 the
(the interquartile range) outcome 5 has a frequency of 3
• the whiskers show us the extent of the frequency distribution table
bottom and top quartiles as well as • a table that shows all the possible
the range outcomes and their frequencies (it usually
is extended by adding other columns such
as the cumulative frequency),
I
4 6 8 10 12 14 Outcome Frequency Cumulative
e.g.
class centre frequency
• the middle outcome of a class,
3 4 4
e. g. the class 1-5 has a class centre of 3
class interval 4 1 5
• the size of the groups into which the data 5 3 8
is organised,
e.g. 1-5 (5 scores); 11-20 (10 scores) 6 2 10
cumulative frequency histogram frequency histogram
(and polygon) • a type of column graph showing the
• these show the 10 outcomes and their frequencies,
c
outcomes and 8 e.g. I
their cumulative g
._
lo...
6
>-.
4 I-
::i 4 ()
c 3 I-
frequencies
§
()
2 CJ
2 I-
LL
1 I-
3 4 5 6 0
I I I I
Outcome 3 4 5 6
Outcome
dot plot
• a graph that uses one axis and a number frequency polygon
of dots above the axis • a line graph formed by joining the
five-point summary midpoints of the top of each column;
• a set of numbers consisting of the to complete the polygon the outcomes
minimum score, the three quartiles and immediately above and below those
the maximum score present are used (the heights of these
columns is zero)
4
>-.
g 3
2
CJ
1
LL o ....._.__..___.'----L-__.___._...__._I--..
3 4 5 6
Outcome
1' Statistics
STATISTICS
These questions reflect the important skills introduced in this chapter.
Errors made will indicate areas of weakness.
Each weakness should be treated by going back to the section listed.
1 The students of class 9M were given a reading test and rated from 14:01,
0 (a poor reader) to 5 (an excellent reader). 14:02
The results are given below.
4 1 0 2 3 3 3 2 2 1
Outcome Tally f cf
0 2 2 4 3 5 3 2 1 3
(x)
2 0 3 1 3 4 5 1 0 2
a Complete this frequency distribution 0
table. 1
b What is the frequency of 5?
c How many students were given a 2
rating less than 4? 3
d On the same diagram, draw the
4
frequency histogram and the
frequency polygon. 5
e On the same diagram draw the Total:
cumulative frequency histogram and
the cumulative frequency polygon.
f What is the range of these scores?
g Find the mode, median and mean for these scores.
2 Use your calculator to evaluate the mean for the scores in the following frequency 14:01
tables. Give your answer correct to two decimal places.
a Outcome Freq. b Outcome 4· 1 4·2 4.3 4·4 4.5 4·6 4.7
27 18 Freq. 7 11 16 8 12 7 3
28 50
29 23
30 9
3 These are the scores gained by each team co1npeting in the Lithgow car rally 14:01,
this year. 14:02
27 18 0 45 63 49 50 31 9 26
4 41 38 20 69 38 17 43 16 37
28 14 58 52 37 43 38 51 44 33
25 38 11 43 40 56 62 48 53 22
a Draw a grouped frequency table using classes 0-9, 10-19 etc. Use the columns:
class, class centre, tally, frequency and cumulative frequency.
b Prepare a stem-and-leaf plot for the scores above.
5 7 12 15 2 37
I
6 10 13 33 3 44
I
7 16 14 53 4 47
I
8 23 15 62 5 so
5 Use the following graphs to calculate the median for each set of data. 14:02
a 40 b 20 c 40
>. >. >.
() u u
c c c
30 15 (1)
:::J
30
O"' O"' O"'
(1) (1) (1)
...... ...... ......
20 (1)
> 10 (1)
>
20
·-
...... ·-
...... ·-
......
C\S
-:::JC\S -:::J -C\S:::J
E 10 E 5 E 10
:::J :::J :::J
() () ()
0
3 4 5 6 7 8 5 6 7 8 9 11 12 13 14 15
Outcome Outcome Outcome
f 2 0 5 4 5 6 5 6 3 4
c The lengths of 16 fish caught were measured. The results are shown on this dot
plot. What is the interquartile range?
•
• •
• • • • •
• • • • • • • •
20 22 24 26 28 30 32 34
Length of fish (cm)
1' Statistics
7 Find the five-point summary for each set of data in Question 6. 14:04
8 Draw box plots for the data in Question 6 a, b and c. 14:04
9 These double box plots were drawn to compare the results of Year 10 in two tests. 14:05
Q1 Q2 Q3
- - - - - - 1 . . . __ _ _ ___.___ _ ( Test 2 I
ITest 1 ) - - - - - - - - 1 , . _ _ _ __ _ . __ _ __ _ _ _ _ . 1 - - - - - - - - - -
30 40 50 60 70 80 90
Score
a By how much was the median for Test 2 higher than the median of Test 1?
b What was the range and interquartile range of Test 1?
10 Test 1 scores Test 2 scores
12 17 19 12 15 10 21 15 18 7 11 16
9 22 24 11 18 8 20 12 23 12 10 13
25 15 18 20 18 18 12 19 12 14 20 9
a Draw a dot plot for the scores on Test 1.
b Draw a back-to-back stem-and-leaf plot to co1npare the scores on Test 1 and
Test 2 .
c Draw double box plots to compare the scores on Tests 1 and 2.
62 64 66 68 70 72 74 76 78 80 82
The results are shown on Length of index finger (mm)
this dot plot.
a Are any outliers present in this data?
b Find the five-point summary for the data if the outlier is:
i included ii omitted.
c What is the interquartile range if the outlier is:
i included ii omitted?
d Comment on the shape of the distribution (ignore the outlier in this case).
1' Statistics
5 After the Year 8 semester exam the inaths Class Class f c .f. f x c.c.
staff organised the 99 marks into a grouped centre [c.c.)
frequency distribution. The results are
10-19 2
shown in the table.
20-29 9
a Copy and complete the grouped
30-39 10
frequency distribution table.
Use it to find: 40-49 8
i the modal class ii the mean. 50-59 16
b Construct an ogive and use it to find 60-69 20
the median class. 70-79 13
80-89 14
90-99 7
:3·6 cm :
I
16 3 50 292 99 13 - -
I
I
I • .. I
4cm 6 Choose the heading from the list below
3 When this net is folded to 5 that would best fit each graph.
form a cube, the nuni_bers on 7 3 4 2 a b
5 ....... .......
the three faces that meet at ..c
Q') ..c
·-
<1>
Q')
·-
each vertex are multiplied together. I I
<1>
2 On a photocopier the enlargement and reduction factors are given as percentages. 11 :01
Find the enlargement and reduction factor for the fallowing.
a Enlarge a photograph that is 18 cm long and 13 cm wide so that it is 28·8 cm
long and 20·8 cm wide.
b Reduce a drawing that is a square of side 16 cm to a square of side 12 cm.
3 a Find the gradient and y-intercept of the 9:05,
line AB and use it to write its equation. 9:06,
b Find the equation of the line DC and use 10:02A
it to find the x-intercept of the line. 2
c Use simultaneous equations to find the
B
point of intersection of line AB with
-4 -2 0 4 x
the line 2x + y = 4.
-2
7 At the end of the financial year Hannah's total income is $71325. She has allowable 8:04
tax deductions of $3327 and throughout the year she has made PAYG deductions
totalling $14 924. Calculate:
a her taxable incon1e
b the tax payable on her taxable income if she has to pay $4650 plus 30c for each
dollar over $3 7 OOO
c the Medicare levy, which is 1·5% of her taxable income
d the refund due or the amount of tax still payable.
1' Statistics