0% found this document useful (0 votes)
4 views

Statistic Review

The document provides an overview of statistics, including the collection, analysis, and interpretation of data. It categorizes data into types such as categorical (nominal and ordinal) and numerical (discrete and continuous), and discusses methods for organizing and displaying data, including frequency tables and stem-and-leaf plots. Additionally, it includes exercises for practical application of statistical concepts.

Uploaded by

mukiushilla002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Statistic Review

The document provides an overview of statistics, including the collection, analysis, and interpretation of data. It categorizes data into types such as categorical (nominal and ordinal) and numerical (discrete and continuous), and discusses methods for organizing and displaying data, including frequency tables and stem-and-leaf plots. Additionally, it includes exercises for practical application of statistical concepts.

Uploaded by

mukiushilla002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

STATISTICS

Statistics is the science that is concerned with the collection, analyzing and interpretation of
numerical information (i.e. data). One of the aim of statistics is to rearrange and condense the raw
information into a form which is more easily read so that patterns and characteristics may be
identified, conclusion drawn and inferences may be made.

Categorical data (Qualitative) – data which do not involve numbers or measurements and are put
into categories such as types of cars.

Nominal data – categorical data which have no order associated with them. (e.g. students’ hair
colour)

Ordinal data – categorical data that are associated with some qualitative scale. (strongly agree to
strongly disagree).

Numerical (Quantitative) – data which involves numbers or measurement.

Continuous data – data collected by measuring and can take any value within a given range. (e.g.
weight of a person.)

Discrete data – data collected by counting and can take only certain values, usually whole numbers.
(e.g. number of children in a family)

1|Page
EXERCISE 3.0

1. Sate whether the numerical data gathered in each of the following situations are discrete or
continuous.

a. The heights of 60 tomato plants at a plant nursery.


b. The number of jelly beans in each of 50 packets.
c. The time taken for each student in a class of sixty-year-olds to tie his or her
shoelaces.
d. The petrol consumption rate of a large sample of cars.
e. The IQ (intelligence quotient) of each student in a class.

2. For each of the following, state if the data are categorical or numerical. If numerical state if
the data are discrete or continuous.

a. The number of students in each class at your school.


b. The teams people support in a football match
c. The brands of peanut butter sold at a supermarket.
d. The heights of people in your class.
e. The interest rate charged by each bank.
f. A person’s pulse rate.

3. An opinion poll was conducted. A thousand people were given the statement ‘Euthanasia
should be legalised’. Each person was offered five responses: strongly agree, agree, unsure,
disagree and strongly disagree. Describe the data type in this example.

4. A teacher marks her students’ work with a grade A, B, C, D, or E. Describe the data type in
this example.

5. The number of people who are using a particular bus service are counted over a 2-weeki
period. The data formed by this survey would be best be described as:

A. Categorical data
B. Numerical and discrete data
C. Numerical and continuous data
D. Quantitative data

6. The graph at right shows the number of days of each


weather type for Gold Coast in January.
Describe the data in this example.

2|Page
Organizing of data – Frequency Table
Raw data: collected information which is not organized numerically, that is, it is not
arranged in any sort of order.

Consider the marks of 50 students obtained in a test.

This is an example of raw data and we see that it is not organized into any sort of order.

Frequency distribution table – Ungrouped data

One way of organizing raw data into order is to arrange it in the form of a frequency
distribution table. A frequency distribution table is a table which displays the frequency
(number of times it occurs) for each of the categories of the data. The number of students
obtaining each mark is found. The tally column of the frequency table enables frequency of
each interval to be quickly totaled.

On examining the raw data, we see that the smallest mark is 1 and the highest mark is 8. The
marks from 1 to 8 inclusively are written in column 1 of the tally chart. We now take each
figure from in the raw data just as it comes and place a tally mark opposite the appropriate
mark.

The fifth tally mark for each number is usually made in an oblique direction there by tying
the tally mark into bundles of five.

When the tally marks are completed they are counted and the numerical value recorded in
the column headed ‘frequency’. Hence the frequency is the number of times each mark
occurs. Frequency is the number of times each mark occurs. From the tally chart below it will
be seen that the mark 1 occurs twice, (a frequency of 2), the mark 5 occurs 12 times (a
frequency of 12) and so on.

3|Page
Example 1

Table 1: Frequency distribution table showing marks scored by 50 students in a test out of
10.

Example 2

A survey is conducted among 24 students who were asked to name their favourite spectator
sport. Their responses are recorded below.

Table 2: Frequency distribution table showing the favourite spectator sport of 24 students.

4|Page
GROUPED DISTRIBUTION

When dealing with a large amount of numerical data it is useful to group the numbers into classes or
categories. We can then find out the number of items belonging to each class thus obtaining a class
frequency.

Example 1

The numbers shown below are the times to the nearest second for 40 children to complete a length
of swimming pool. The swimmers were divided into heats as the pool has eight lanes.

Display this data in a form of group a grouped frequency table using the intervals 30 – 34, 35 – 39
etc…

The main advantage of grouping is that it produces a clear overall picture of the distribution.
However too many groups will destroy the pattern of distribution whilst too few will destroy much
of the detail which was presented in the raw data. Depending upon the volume of the raw data, the
number of classes is usually between 5 and 20.

Class Intervals

In table 2, the first class is 30 – 34. These figures give the class interval. For the second class, the
class interval is 35 – 39. The end numbers 35 and 39 are called class limits for the second class, 35
being the lower class limit and 39 being the upper class limit.

The class size is 5 that is the difference between the class centres or the upper class limit minus the
lower class limit plus one. It can also be calculated by subtracting the lower class boundary from the
upper class boundary.

Class size: 37 – 32 = 5 or (39 – 34) + 1 = 5 or 39.5 – 34.5 = 5

5|Page
The class centre of the first class is 32. That is the lower class limit plus the upper class limit divide by
30  34
2. Class centre (mid-point) of first class =  32
2
Class boundary

In table 2, the times have been recorded to the nearest second. The class interval 35 – 39
theoretically includes all the time between 34.5 and 39.5 seconds. These numbers are called the
lower and upper class boundaries respectively for the second class. For any distribution, the class
boundaries may be found by adding the upper class limit of one class to the lower class limit of the
next class and dividing the sum by two

Example 2

A year 11 class was surveyed on their weekly income. The responses are shown below.

a) Complete the table below

Income Tally Frequency


$0 - $20
$21 - $40
$41 - $60
$61 - $80
$81 - $100
$101 - $120
Total

b) State the upper and lower class boundaries of the second class.

20  21
(i) Lower class boundary =  20.5
2

40  41
(ii) Upper class boundary =  40.5
2

6|Page
EXERCISE 3.1

1. A class of students was asked to identify the type of car their family owned. Their responses
are shown below.

Put these results into a table.

2. The results of a spelling test done by 30 students are shown below.

Put these results into a frequency distribution table.

3. The marks out of ten for a mental arithmetic test were


9 10 4 9 7 9 3 2 9 5 8 7 1 8 3 6 2 10 5 9
0 8 7 4 6 1 6 5 7 7 5 3 3 5 8 2 6 7 9 8

a) Rearrange the data into a frequency distribution table using a tally column.
b) How many students sat for the test?
c) How many scored full marks?
d) How many scored zero?
e) What percentage of students scored 5 or more?

4. The marks scored on a Math exam, out of 100, by 25 year 11 students are shown below.

Copy and complete the table

Marks Tally Frequency


40 - 49
50 – 59
60 – 69
70 – 79
80 - 89
90 - 99
Total

7|Page
5. The data below show the number of customers that entered a shop in a certain month.

Choose suitable groupings to tabulate these data.

6. A marriage counsellor asked his clients to keep a record of


the number of arguments they had in a one-week period. No.of arguments Frequency
The results collected were 0 -2 8
a) How many different responses are included in each 3–5 18
class interval? 6–8 10
b) How many clients responded to the councilor’s request? 9 – 11 3
c) What percentage of these clients had less than 3 12 - 14 1
arguments in the week?
d) What percentage had more than 8 arguments?
e) Can you determine how many clients had 6 arguments?

STEM AND LEAF PLOT

The main disadvantage of grouped distribution tables is that some information is lost. For example,
it is not possible to determine the lowest and highest scores from the table.

Stem and leaf plots are another way of displaying information. They are used to group and rank data
to show the range and distribution of the data.

The stem is the first digit(s) and the leaf is the final digit of a number.

A stem may have any number of digits but a leaf has exactly one.

8|Page
Example 1

The results in a mathematics class test are given below

43, 45, 46, 22, 65, 65, 23, 53, 45, 26, 46, 61, 51, 57, 55,
55, 66, 57, 42, 41, 63, 70, 57, 65, 48, 23, 67, 62, 70, 46

a) Draw a stem and leaf plot to this data.

In this stem and leaf plot, the tens digit forms the stem and the units’ digit forms the leaf.
This means that for the mark 45, the stem is 4 and the leaf is the 5.

The leaves are now put into ascending numerical order.

b) What are the lowest and highest scores?

Lowest score = 22, Highest score = 70.

c) How many students scored?


(i) 46 = 3 (ii) 50 = 0 (iii) 70 = 2 (iv) a mark in the sixties? = 8

With a stem and leaf plot

 All of the data is used and displayed


 The largest and smallest measurements can be found
 The clustering of data can be more easily seen
 The length of the leaf row indicates the number of scores belonging to that stem.
 The plot itself gives a graphical representation of the spread of data.

9|Page
EXERCISE 3.2

1. a) Draw a stem and leaf plot, using the stems 3, 4, 5 and 6 for the scores

b) What are the lowest and highest scores?


c) How many times does the score i. 50 and ii. 40 occur?
d) Which score occurs the most often?
e) How many scores are in the sixties?
f) How many scores are less than 50?

2. a) Draw a stem and leaf plot, using the stem 12, 13, 14, and 15, for the scores

a) What are the lowest and highest scores?


b) How many times does the score i. 50 ii.60 iii.70 and iv. 80 occur?
c) Which score occurs the most?
d) How many scores are i. less than 130 ii. 150 or more?

3. Draw stem and leaf plots for the scores

a) 16, 24, 13, 8, 22, 4, 5, 26, 14, 10, 2, 20, 11, 23, 8, 7, 24, 8, 12, 9

b) 263, 258, 281, 265, 274, 270, 283, 270, 254, 265, 280, 274, 256, 280, 251, 263, 280, 278, 259, 260

4. the following stem and leaf plot shows the time spent (hours) watching TV, by a group of students
during one week.

a) How many students were surveyed?

b) What was the least and greatest number of hours of TV watched?

c) How many students watched less than 10 hours of TV per week?

d) How many students watched more than 30 hours of TV a week?

e) Draw a grouped frequency table to represent this data, using class intervals of

i. 1 – 5, 6 – 10, 11 – 15, etc

f) Comment on the advantages and disadvantages of the stem and leaf plot compared with the
grouped frequency tables.

10 | P a g e
DISPLAYING DATA

The most common way of displaying data is by using graph. Different graphs have different
purposes. Common graphs used to highlight the trends in collected data include column graphs,
sector graphs (pie), histograms and stem – and – leaf plots. The aim of using graphs to display data is
to present the information in a way which is visually attractive. Although there is some loss of detail,
compared with a table, it is often easier to see trends and relationships. Often a table and a graph
are used to convey the maximum amount of information.

Column graphs

A column graph is used when we wish to show a quantity. Categories are written on the horizontal
axis and frequencies on vertical axis.

Example 1

The table below shows the result of the survey on favourite sports.

Show this information in a column graph.

 Draw the horizontal axis showing each sport.


 Draw a vertical axis to show frequencies up to 7.
 Draw the columns all the same width with gaps
between.
 Use a ruler.
 Label the axis.
 Give the graph a title.

11 | P a g e
Sector graphs

A sector graph or pie graph is used when we want to display a comparison of quantities. An angle is
drawn at the centre of the circle that is the same fraction of 360o as the fraction of people making
each response. The area of each sector is proportional to the size of each category and hence each
sector angle is proportional to the size of each category.

Example 2

For the above table draw a sector graph.

Sports

AFL
Cricket
Rugby league
Rugby union
Soccer
Basket ball
Netball
Tenis

EXERCISE 3. 3

1. Draw a column graph to display the data given below.

12 | P a g e
2. Draw a sector graph to display the data given below.

3. The bar chart at right shows the marital status of


respondents to a survey.

a) How many people responded to the survey?


b) What was the most common marital status?
c) How many people were married?
d) How many respondents were either divorced or separated?
e) How many people had been married for some time?

4. The table shows the percentage of the work force in


Industry % of
each of the industry categories given.
workforce
a) For a sector graph, calculate the size of the sector Agriculture 5
angle for each category. Manufacturing 26
Construction 12
Hospitality 35
b) Draw a sector graph to illustrate this information. Finance 16
Public Administration 6

13 | P a g e
FREQUENCY HISTOGRAM

A frequency histogram is similar to a column graph with the following essential features.

1. Gaps are never left between the columns, except for a half unit space before the first
column.
2. If the chart is coloured or shaded, then is done all in one colour.
3. Frequency is always plotted on the vertical axis.
4. For ungrouped data the horizontal scale is marked so that the data labels appear under the
centre of each column. For ungrouped data the horizontal scale is marked so that the class
centre of each class appears under the centre of the column.

Histograms provide a pictorial representation of the distribution of results, providing information at


a glance that is obvious from the listed data.

EXAMPLE 1

The table below shows the number of people living in each house in a street.

No.of people Frequency


1 1
2 4
3 10
4 15
5 8

Show this information in a frequency histogram.

 Draw a set of axes with the number of


people living in a house on the horizontal
axis and frequency on the vertical axis.
 Draw the graph, leaving half a column width
Space before the first column.

A frequency polygon is a line graph that can be drawn by joining the centres of the tops of each
column of the histogram. The polygon starts and finishes on the horizontal axis a half column width
space from the group boundary of the first and last groups. The frequency polygon highlights
changes in the distribution

14 | P a g e
Example 2

The frequency table below shows a class set of marks on an exam. Draw a frequency histogram and
polygon on the same axis.

Mark Class centre Frequency


51 – 60 55.5 3
61 – 70 65.5 5
71 – 80 75.5 12
81 – 90 85.5 7
91 - 100 95.5 3

 Draw a set of axes with the exam mark


on the horizontal axis and frequency
on the vertical axis. Show the class centres
for the exam marks
 Draw the columns, leaving a half – column - width
Space before the first column.
 Make sure the line graph begins and ends on the
horizontal axis.

15 | P a g e
EXERCISE 3.4

1. Figure 3.1 is a histogram showing the distribution of the number


of goals scored by a soccer team in 30 matches.

a) In how many matches were exactly 3 goals scored?


b) What was the most common number of goals scored
in a match?
c) In how many matches were more than 3 goals scored?
d) How many matches were played where no goals were
scored by this team?

No. of mistakes Frequency


2. A survey is done on young drivers taking the written 0 5
test for their licence. The number of mistakes each makes 1 8
is recorded and the results are shown in the frequency 2 11
distribution table at right. Show this information in a
3 4
frequency histogram.
4 3
5 1

3. The label on a box of matches states that the average contents of a box is 50 matches.
Quality control surveyed 50 boxes for the number of matches and the results are shown
below.

a) Put this information into a frequency table.

b) Show the results in a frequency histogram and polygon.

Relative Frequency

The relative frequency of a particular score (or class interval) indicates the significance of the
particular score (or interval) when compared to the entire distribution. The Relative frequency is the
fraction of times that the score occurs.

16 | P a g e
For the purpose of comparison, it is often convenient to express this fraction as a percentage
(sometimes called the percentage relative frequency). The relative frequency of a particular score
(or class interval) is usually expressed as a percentage (%).

A relative frequency column is easily incorporated into a frequency table. Relative frequency
histogram and relative frequency polygons enable the information to be presented visually as shown
below.

Cumulative Frequency

A frequency table may be extended to introduce a cumulative frequency column that keeps a
running total of the number of scores included.

An example of a cumulative frequency table is provided below.


Note: the last entry in the cumulative frequency column is actually equal to the total number of
scores.

SCORE FREQUENCY CUMMULATIVE


FREQUENCY
20 1 1
21 2 3
22 3 6
23 4 10
24 4 14
25 1 15
26 1 16
TOTAL 16

17 | P a g e
The information in the table may be represented using a histogram.
A cumulative frequency polygon is obtained by connecting the end points of each bar on the
histogram as shown.

Note: The cumulative frequency curve is normally referred to as the OGIVE Curve.

The ogive differs from other polygons in two ways.

1. It is obtained from the end points of the histogram bars.


2. It is represented as a smooth curve.

Example 1

a. Copy the frequency distribution table and


(i) Add a cumulative frequency column.
(ii) Add a relative frequency column. Number of Candychocs Frequency
36 1
37 5
b. How many packets contained 38 8
(i) 40 or less (ii) less than 38 Candychocs? 39 13
40 7
c. What fraction of packets contained 41 4
42 2
(i) 37 (ii) 40 Candychos?

d. What percentage of packets contained


(i) 38 (ii) 39 Candychocs

18 | P a g e
a)
Number of Candychocs Frequency Cumulative frequency Relative frequency
36 1 1 1
 2 .5 %
40
37 5 6 5
x  12.5%
40
38 8 14 8
 20%
40
39 13 27 13
 32.5%
40
40 7 34 7
 17.5%
40
41 4 38 4
 10%
40
42 2 40 2
 5%
40
Total 40 1 = 100 %

b) i. Number of packets containing 40 or less Candychocs


= cumulative frequency of the score 40
= 34

ii. Number of packets containing less than 38 Candychocs


= number of packets containing 36 or 37 Candychocs
= cumulative frequency of 37
=6

5 1 7
c) i.  ii.
40 8 40

d) i. 20% ii. 32.5%

USE OF THE CUMULATIVE FREQUENCY CURVE

The cumulative frequency curve provides a useful tool for extracting information about the
distribution of results within a set of data.

19 | P a g e
A cumulative frequency curve (i.e. an ogive curve) is provided in the frame below. Familiarize
yourself with the graph then answer the questions that follow.

EXERCISE 3.5

1. a) What mark was the test out of? b) How many students did the test?
2. Draw up u frequency table using the information provided on the graph.

Study the examples next to the graph, then answer the following

3. How many students scored marks in the range indicated below?


a) ≤4 b) ≤9 c) ≤10 d) < 4 e) < 10 f) < 1

4. How many students scored marks in the range indicated below?

Eg. i. For marks ≥ 5,

We consider the marks <5 and subtract from the total number of students.

Ans: 26 – 5 = 21

ii. For marks > 6

we consider the marks ≤ 6 and subtract from the total number of students.

Ans: 26 – 11 = 15

a) ≥6 b) ≥ 3 c) ≥8 d) ≥ 10 e) > 5 f) > 3 g) > 0

20 | P a g e
5. If the students’ marks are arranged in ascending order, give the range of marks obtained by:

a) first 5 students b) the first 10 students

6. How many students had marks in the range?


a) 1 to 4 b) 5 to 10 c) 3 to 8

7. What percentage of students had marks?


a) Below 7 b) ≤7 c) <5

8. Give the range of mark obtained by top 50% of students.


Goals/match Frequency
5 4
9. The number of goals scored by the goal shooter in a 6 3
netball team for a session is shown in the frequency table 7 7
a) Copy the frequency distribution table and 8 5
i) Add a cumulative frequency column 9 0
ii) Add a relative frequency column 10 3
11 2
12 1

b) In how many games did she score


i) 7 or less ii. Less than 10 goals?

c) In what fraction of games did she score i. 8 ii. 5 goals?

d) In what percentage of games did she score i. 6 ii. Goals?

INTERPRETATION OF DATA – MEASURES OF CENTRAL TENDENCY

Frequency tables and associated graphs assist in establishing trends present in sets of data, however
there are other measures available which attempt to refine the data into single representative
scores.

3 most common measures of central tendency are:

a) the mean b) the mode c) the median

Other measures that further identify characteristics of a particular distribution are:

a) the range and b) the standard deviation

21 | P a g e
The five measures are explained in the following sections.

EXERCISE 3.6

Determine the mean, median and modal values for the following sets of data.

Eg: for the scores:

7, 9, 2, 2, 5, 11

Mode = 2 (it appears twice)

Median = arrange in ascending order gives:

a) 1, 1, 3, 5, 5, 5, 6, 9, 10, 10
b) 12, 18, 15, 10, 18, 20, 6, 11, 9, 18, 0
c) -2, -5, -4, -3, -2, -8, -2, -6, -4
1 1 3 1
d) , , ,0,1,
2 4 4 2

22 | P a g e
USING YOUR CALCULATOR TO FIND THE MEAN

Given below is one way of finding the mean using a CASIO calculator. Check the instruction booklet
to determine the appropriate steps for your calculator.

Follow these steps

Step 1: Set the calculator to statistics mode (SD) by pressing MODE and the key for statistics
functions.

Step 2: Make sure the statistics memory is clear by pressing SHIFT AC

Step 3: Enter the first score and press the M+ key. Repeat for each score.

Step 4: when all the scores have been entered, press the appropriate key for the mean x .

CALCULATION OF THE MEAN USING A FREQUENCY TABLE – UNGROUPED DATA

When the results are grouped in the form of a frequency table, the mean can be found readily using
a series of sub totals as shown.

Exercise: Copy and complete the table and so determine the mean. (round the answers to 4
significant figures).

a) Score Frequency b) Score Frequency c) Score Frequency


2 2 21 2 55 2
5 5 22 8 60 8
8 4 23 12 65 9
7 6 24 15 70 8
9 3 25 13 75 3
10 5 26 10 80 5

23 | P a g e
DETERMINATION OF THE MEDIAN FROM A FREQUENCY TABLE – UNGROUPED DATA.

The median score is readily determined from a frequency table, provided a cumulative frequency
column is used.

The following rules lead to the median score of a distribution:

N 1
A. For an odd number of scores, the middle score is the one located in the position given by ,
2
where ‘N’ represents the total number of scores.
13  1
Eg. 1 For 13 scores, the median is the score in the or 7th position.
2
B. For an even number of scores, the middle score is taken as the average of the scores in positions
N N
and  1 .
2 2
10 10
Eg. For 10 scores, the median is found by averaging the scores in the positions and  1
2 2
i.e. 5th and 6th scores.

Once the location of the middle score(s) is determined the cumulative frequency table can be scanned
to identify its value.

Example

Score Frequency Cumulative Frequency Position


2 3 3 1st – 3th
4 7 10 4th – 10th
5 2 12 11th – 12th
7 3 15 13th – 15th
9 3 18 16th – 18th
10 2 20 19th – 20th
N = 20

Note: since there are 20 scores, the median iks the average olf the 10th and 11th scores.

10th score is 4 and 11th score is 5

45
Median =  4.5
2

24 | P a g e
25 | P a g e
26 | P a g e
27 | P a g e
28 | P a g e
29 | P a g e
30 | P a g e
31 | P a g e
32 | P a g e
33 | P a g e
34 | P a g e
35 | P a g e

You might also like