Statistic Review
Statistic Review
Statistics is the science that is concerned with the collection, analyzing and interpretation of
numerical information (i.e. data). One of the aim of statistics is to rearrange and condense the raw
information into a form which is more easily read so that patterns and characteristics may be
identified, conclusion drawn and inferences may be made.
Categorical data (Qualitative) – data which do not involve numbers or measurements and are put
into categories such as types of cars.
Nominal data – categorical data which have no order associated with them. (e.g. students’ hair
colour)
Ordinal data – categorical data that are associated with some qualitative scale. (strongly agree to
strongly disagree).
Continuous data – data collected by measuring and can take any value within a given range. (e.g.
weight of a person.)
Discrete data – data collected by counting and can take only certain values, usually whole numbers.
(e.g. number of children in a family)
1|Page
EXERCISE 3.0
1. Sate whether the numerical data gathered in each of the following situations are discrete or
continuous.
2. For each of the following, state if the data are categorical or numerical. If numerical state if
the data are discrete or continuous.
3. An opinion poll was conducted. A thousand people were given the statement ‘Euthanasia
should be legalised’. Each person was offered five responses: strongly agree, agree, unsure,
disagree and strongly disagree. Describe the data type in this example.
4. A teacher marks her students’ work with a grade A, B, C, D, or E. Describe the data type in
this example.
5. The number of people who are using a particular bus service are counted over a 2-weeki
period. The data formed by this survey would be best be described as:
A. Categorical data
B. Numerical and discrete data
C. Numerical and continuous data
D. Quantitative data
2|Page
Organizing of data – Frequency Table
Raw data: collected information which is not organized numerically, that is, it is not
arranged in any sort of order.
This is an example of raw data and we see that it is not organized into any sort of order.
One way of organizing raw data into order is to arrange it in the form of a frequency
distribution table. A frequency distribution table is a table which displays the frequency
(number of times it occurs) for each of the categories of the data. The number of students
obtaining each mark is found. The tally column of the frequency table enables frequency of
each interval to be quickly totaled.
On examining the raw data, we see that the smallest mark is 1 and the highest mark is 8. The
marks from 1 to 8 inclusively are written in column 1 of the tally chart. We now take each
figure from in the raw data just as it comes and place a tally mark opposite the appropriate
mark.
The fifth tally mark for each number is usually made in an oblique direction there by tying
the tally mark into bundles of five.
When the tally marks are completed they are counted and the numerical value recorded in
the column headed ‘frequency’. Hence the frequency is the number of times each mark
occurs. Frequency is the number of times each mark occurs. From the tally chart below it will
be seen that the mark 1 occurs twice, (a frequency of 2), the mark 5 occurs 12 times (a
frequency of 12) and so on.
3|Page
Example 1
Table 1: Frequency distribution table showing marks scored by 50 students in a test out of
10.
Example 2
A survey is conducted among 24 students who were asked to name their favourite spectator
sport. Their responses are recorded below.
Table 2: Frequency distribution table showing the favourite spectator sport of 24 students.
4|Page
GROUPED DISTRIBUTION
When dealing with a large amount of numerical data it is useful to group the numbers into classes or
categories. We can then find out the number of items belonging to each class thus obtaining a class
frequency.
Example 1
The numbers shown below are the times to the nearest second for 40 children to complete a length
of swimming pool. The swimmers were divided into heats as the pool has eight lanes.
Display this data in a form of group a grouped frequency table using the intervals 30 – 34, 35 – 39
etc…
The main advantage of grouping is that it produces a clear overall picture of the distribution.
However too many groups will destroy the pattern of distribution whilst too few will destroy much
of the detail which was presented in the raw data. Depending upon the volume of the raw data, the
number of classes is usually between 5 and 20.
Class Intervals
In table 2, the first class is 30 – 34. These figures give the class interval. For the second class, the
class interval is 35 – 39. The end numbers 35 and 39 are called class limits for the second class, 35
being the lower class limit and 39 being the upper class limit.
The class size is 5 that is the difference between the class centres or the upper class limit minus the
lower class limit plus one. It can also be calculated by subtracting the lower class boundary from the
upper class boundary.
5|Page
The class centre of the first class is 32. That is the lower class limit plus the upper class limit divide by
30 34
2. Class centre (mid-point) of first class = 32
2
Class boundary
In table 2, the times have been recorded to the nearest second. The class interval 35 – 39
theoretically includes all the time between 34.5 and 39.5 seconds. These numbers are called the
lower and upper class boundaries respectively for the second class. For any distribution, the class
boundaries may be found by adding the upper class limit of one class to the lower class limit of the
next class and dividing the sum by two
Example 2
A year 11 class was surveyed on their weekly income. The responses are shown below.
b) State the upper and lower class boundaries of the second class.
20 21
(i) Lower class boundary = 20.5
2
40 41
(ii) Upper class boundary = 40.5
2
6|Page
EXERCISE 3.1
1. A class of students was asked to identify the type of car their family owned. Their responses
are shown below.
a) Rearrange the data into a frequency distribution table using a tally column.
b) How many students sat for the test?
c) How many scored full marks?
d) How many scored zero?
e) What percentage of students scored 5 or more?
4. The marks scored on a Math exam, out of 100, by 25 year 11 students are shown below.
7|Page
5. The data below show the number of customers that entered a shop in a certain month.
The main disadvantage of grouped distribution tables is that some information is lost. For example,
it is not possible to determine the lowest and highest scores from the table.
Stem and leaf plots are another way of displaying information. They are used to group and rank data
to show the range and distribution of the data.
The stem is the first digit(s) and the leaf is the final digit of a number.
A stem may have any number of digits but a leaf has exactly one.
8|Page
Example 1
43, 45, 46, 22, 65, 65, 23, 53, 45, 26, 46, 61, 51, 57, 55,
55, 66, 57, 42, 41, 63, 70, 57, 65, 48, 23, 67, 62, 70, 46
In this stem and leaf plot, the tens digit forms the stem and the units’ digit forms the leaf.
This means that for the mark 45, the stem is 4 and the leaf is the 5.
9|Page
EXERCISE 3.2
1. a) Draw a stem and leaf plot, using the stems 3, 4, 5 and 6 for the scores
2. a) Draw a stem and leaf plot, using the stem 12, 13, 14, and 15, for the scores
a) 16, 24, 13, 8, 22, 4, 5, 26, 14, 10, 2, 20, 11, 23, 8, 7, 24, 8, 12, 9
b) 263, 258, 281, 265, 274, 270, 283, 270, 254, 265, 280, 274, 256, 280, 251, 263, 280, 278, 259, 260
4. the following stem and leaf plot shows the time spent (hours) watching TV, by a group of students
during one week.
e) Draw a grouped frequency table to represent this data, using class intervals of
f) Comment on the advantages and disadvantages of the stem and leaf plot compared with the
grouped frequency tables.
10 | P a g e
DISPLAYING DATA
The most common way of displaying data is by using graph. Different graphs have different
purposes. Common graphs used to highlight the trends in collected data include column graphs,
sector graphs (pie), histograms and stem – and – leaf plots. The aim of using graphs to display data is
to present the information in a way which is visually attractive. Although there is some loss of detail,
compared with a table, it is often easier to see trends and relationships. Often a table and a graph
are used to convey the maximum amount of information.
Column graphs
A column graph is used when we wish to show a quantity. Categories are written on the horizontal
axis and frequencies on vertical axis.
Example 1
The table below shows the result of the survey on favourite sports.
11 | P a g e
Sector graphs
A sector graph or pie graph is used when we want to display a comparison of quantities. An angle is
drawn at the centre of the circle that is the same fraction of 360o as the fraction of people making
each response. The area of each sector is proportional to the size of each category and hence each
sector angle is proportional to the size of each category.
Example 2
Sports
AFL
Cricket
Rugby league
Rugby union
Soccer
Basket ball
Netball
Tenis
EXERCISE 3. 3
12 | P a g e
2. Draw a sector graph to display the data given below.
13 | P a g e
FREQUENCY HISTOGRAM
A frequency histogram is similar to a column graph with the following essential features.
1. Gaps are never left between the columns, except for a half unit space before the first
column.
2. If the chart is coloured or shaded, then is done all in one colour.
3. Frequency is always plotted on the vertical axis.
4. For ungrouped data the horizontal scale is marked so that the data labels appear under the
centre of each column. For ungrouped data the horizontal scale is marked so that the class
centre of each class appears under the centre of the column.
EXAMPLE 1
The table below shows the number of people living in each house in a street.
A frequency polygon is a line graph that can be drawn by joining the centres of the tops of each
column of the histogram. The polygon starts and finishes on the horizontal axis a half column width
space from the group boundary of the first and last groups. The frequency polygon highlights
changes in the distribution
14 | P a g e
Example 2
The frequency table below shows a class set of marks on an exam. Draw a frequency histogram and
polygon on the same axis.
15 | P a g e
EXERCISE 3.4
3. The label on a box of matches states that the average contents of a box is 50 matches.
Quality control surveyed 50 boxes for the number of matches and the results are shown
below.
Relative Frequency
The relative frequency of a particular score (or class interval) indicates the significance of the
particular score (or interval) when compared to the entire distribution. The Relative frequency is the
fraction of times that the score occurs.
16 | P a g e
For the purpose of comparison, it is often convenient to express this fraction as a percentage
(sometimes called the percentage relative frequency). The relative frequency of a particular score
(or class interval) is usually expressed as a percentage (%).
A relative frequency column is easily incorporated into a frequency table. Relative frequency
histogram and relative frequency polygons enable the information to be presented visually as shown
below.
Cumulative Frequency
A frequency table may be extended to introduce a cumulative frequency column that keeps a
running total of the number of scores included.
17 | P a g e
The information in the table may be represented using a histogram.
A cumulative frequency polygon is obtained by connecting the end points of each bar on the
histogram as shown.
Note: The cumulative frequency curve is normally referred to as the OGIVE Curve.
Example 1
18 | P a g e
a)
Number of Candychocs Frequency Cumulative frequency Relative frequency
36 1 1 1
2 .5 %
40
37 5 6 5
x 12.5%
40
38 8 14 8
20%
40
39 13 27 13
32.5%
40
40 7 34 7
17.5%
40
41 4 38 4
10%
40
42 2 40 2
5%
40
Total 40 1 = 100 %
5 1 7
c) i. ii.
40 8 40
The cumulative frequency curve provides a useful tool for extracting information about the
distribution of results within a set of data.
19 | P a g e
A cumulative frequency curve (i.e. an ogive curve) is provided in the frame below. Familiarize
yourself with the graph then answer the questions that follow.
EXERCISE 3.5
1. a) What mark was the test out of? b) How many students did the test?
2. Draw up u frequency table using the information provided on the graph.
Study the examples next to the graph, then answer the following
We consider the marks <5 and subtract from the total number of students.
Ans: 26 – 5 = 21
we consider the marks ≤ 6 and subtract from the total number of students.
Ans: 26 – 11 = 15
20 | P a g e
5. If the students’ marks are arranged in ascending order, give the range of marks obtained by:
Frequency tables and associated graphs assist in establishing trends present in sets of data, however
there are other measures available which attempt to refine the data into single representative
scores.
21 | P a g e
The five measures are explained in the following sections.
EXERCISE 3.6
Determine the mean, median and modal values for the following sets of data.
7, 9, 2, 2, 5, 11
a) 1, 1, 3, 5, 5, 5, 6, 9, 10, 10
b) 12, 18, 15, 10, 18, 20, 6, 11, 9, 18, 0
c) -2, -5, -4, -3, -2, -8, -2, -6, -4
1 1 3 1
d) , , ,0,1,
2 4 4 2
22 | P a g e
USING YOUR CALCULATOR TO FIND THE MEAN
Given below is one way of finding the mean using a CASIO calculator. Check the instruction booklet
to determine the appropriate steps for your calculator.
Step 1: Set the calculator to statistics mode (SD) by pressing MODE and the key for statistics
functions.
Step 3: Enter the first score and press the M+ key. Repeat for each score.
Step 4: when all the scores have been entered, press the appropriate key for the mean x .
When the results are grouped in the form of a frequency table, the mean can be found readily using
a series of sub totals as shown.
Exercise: Copy and complete the table and so determine the mean. (round the answers to 4
significant figures).
23 | P a g e
DETERMINATION OF THE MEDIAN FROM A FREQUENCY TABLE – UNGROUPED DATA.
The median score is readily determined from a frequency table, provided a cumulative frequency
column is used.
N 1
A. For an odd number of scores, the middle score is the one located in the position given by ,
2
where ‘N’ represents the total number of scores.
13 1
Eg. 1 For 13 scores, the median is the score in the or 7th position.
2
B. For an even number of scores, the middle score is taken as the average of the scores in positions
N N
and 1 .
2 2
10 10
Eg. For 10 scores, the median is found by averaging the scores in the positions and 1
2 2
i.e. 5th and 6th scores.
Once the location of the middle score(s) is determined the cumulative frequency table can be scanned
to identify its value.
Example
Note: since there are 20 scores, the median iks the average olf the 10th and 11th scores.
45
Median = 4.5
2
24 | P a g e
25 | P a g e
26 | P a g e
27 | P a g e
28 | P a g e
29 | P a g e
30 | P a g e
31 | P a g e
32 | P a g e
33 | P a g e
34 | P a g e
35 | P a g e