Data
Data
Q1. Classify each of the following data using two selections from the following descriptive words: categorical,
Numerical, nominal, ordinal, discrete and continuous.
a) The number of students absent from school Numerical and discrete.
b) The types of vehicles using a certain road. Categorical Nominal
c) The various pizza sizes available at a local take away. Categorical Ordinal
d) The room temperature at various times during a particular day. Numerical and Continuous
Q2. Match each word with its correct meaning:
a) Discrete i) placed in categories or classes
b) Categorical ii) counted in exact values
c) Ordinal iii) data in the form of numbers
d) Continuous iv) needs further names to complete the description
e) Numerical v) needs a ranking order
f) Nominal vi) measured in decimal numbers
Q3. Classify each of the following data using two words selected from the following descriptive words:
categorical, numerical, nominal, ordinal, discrete and continuous.
a) The population of your town or city
b) The types of motorbike in a parking lot
c) The heights of people in an identification line-up
d) The masses of babies in a group
e) The languages spoken at home by students in your class
f) The time spent watching TV
g) The number of children in the families in your suburb
h) The air pressure in your car’s tyres
i) The number of puppies in a litter
j) The types of radio program listened to by teenagers
k) The times for swimming 50 metres
l) The quantity of fish caught in a net
m) The number of CDs you own
n) The types of shops in a shopping centre
o) The football competition ladder at the end of each round
p) The lifetime of torch batteries
q) The number of people attending a rock concert
r) Exam grades
s) The types of magazine sold at a newsagency
t) Hotel accommodation rating
Data collection
One of the first decisions to be made when collecting data is to decide from whom, or from what, the
information is to be collected. There are two types of data collection:
a census
Census: A census involves collecting information from every individual in the whole population.
The population is all the people or objects you want data about. For example, if all new cars are
tested before sale, this is a census. The Australian Bureau of Statistics (ABS) conducts a census of
the entire population of Australia every 5 years.
o A census is accurate and detailed, but also expensive, time consuming and often
impractical.
a sample survey.
Sample survey: A sample survey involves collecting data from only a part of the population. It is
cheaper and quicker than a census, but not as detailed or accurate. Conclusions drawn from sample
surveys always involve some error. Often this error is due to a bias in the sample or method of data
collection.
EXAMPLE: State whether a census or sample survey be used to investigate each situation.
Solution:
a) Sample: It would obviously be impractical to test every light globe produced until it failed—there would be
none to sell!
b) Census: An accurate analysis of all accidents would be important.
c) Sample: It would be very time consuming and expensive to interview the whole population to find out who
uses Bright Teeth toothpaste.
Do the following exercise:
1. State whether a census or sample survey would be used for each of these investigations. Discuss your
answers in groups.
a) the number of goals scored each week by a netball team
b) the heights of the members of a football team
c) the most popular radio station
d) the number of children in an Australian family
e) the number of loaves of bread bought each week by a family
Solution:
a) The sample would be biased towards people who are at home during the day and have a phone. It does
not include people who go to work in the daytime or do not have a phone.
b) The sample would be biased towards people who catch the train at that station. It does not include
people who use other forms of transport or do not travel or use a different station.
c) The sample would be biased towards people who attend football matches. For example, there would
probably be more males than females at football matches.
d) The sample would be biased by the characteristics of the ten people. A larger sample is needed.
h) A survey of 20 people indicates that 80% of people watch the Channel 9 News.
Solution:
a Relevant question, but it should include ‘usually’ to eliminate the once-in-a-while meals that are not typical.
The expected responses would be Yes or No.
b Somewhat relevant question, but it should be reworded to ask the usual location of the evening meal. The
expected responses might include: at the dining table, in front of the TV, etc.
c Irrelevant question. It also assumes that the answer to the previous question was yes.
d Relevant question, but it needs to be reworded. Does it mean in a week, or over a month? Does the question
refer to the evening meal only, or does it include other meals as well?
e Relevant question, but it may be difficult to use the responses. The question needs to include ‘usual’ and be
restructured to include choices like: ‘red meat and vegetables’, ‘chicken and vegetables’, ‘vegetarian’, etc.
f Relevant question, but it needs to include time slots to tick.
Q1: These questions have been suggested for use in a survey about the different methods of transport used by
people travelling to work. Comment on their appropriateness and possible responses. Reword if necessary.
a Do you own a car?
b What colour are trains?
c How often do you drive to work?
d What type of public transport do you use?
e How often do you travel to work by public transport?
Q2 These questions have been suggested for use in a survey about school uniforms. Comment on their
appropriateness and possible responses. Reword if necessary.
a Do you like the present uniform?
b Do you want to wear a uniform?
c What is your favourite colour?
d How old are you?
e Have you attended a school that doesn’t have a uniform?
f Is your uniform comfortable?
Q3 These questions have been suggested for use in a survey of what people watch on TV. Comment on their
appropriateness and possible responses. Reword if necessary.
a Do you own a TV?
b Do you watch TV?
c How many TVs are there in your house?
d What is your favourite program?
e Do you like sport?
f Which is your favourite channel?
g Do you lie down while watching TV?
Types of questions
The main types of questions are:
• Free-response or open-ended questions, like what is your favourite TV program? The person answers the
question in their own words.
• Yes or No questions, like did you watch program X last week? The person answers Yes or No to the
question.
• True or False questions, which are similar to Yes or No questions.
• Tick-box questions, like What do you like watching on TV? Tick one or more boxes.
□ Nothing □ News □ Sport □ Drama □ Comedy □ Soapies □ Other
Options must include all possible responses.
• Scaled-response questions, like Do you think that there should be more Australian programs on TV?
Circle a number.
1 Strongly disagree 2 Disagree 3 Don’t know 4 Agree 5 Strongly agree
The number of options must be odd and there must be a neutral option.
3 Consider the illustrations shown below. State whether each is an example of primary or secondary data.
Survey- The process of collecting data is called survey.
Displaying/organising data – once the data is collected it can be organised in the form of tables and
graphs.
The process of organising data in a table is called tabulating.
The table in which the data is organised is called frequency distribution table.
A frequency distribution table is a table that displays the frequency (the number of times each piece
of data occurs) for each of the categories of data.
Tally marks are often used to help record the data in the table.
Every fifth tally mark is placed through the four preceding tally marks to make counting easier.
Example 1- A census is taken of a year 7 class. The method by which the students travelled to school on a
particular day is recorded below using the code: Walk (W), Cycle (C), Bus (B), Train (T) and Car (M).
W C B W C B B B W B B B C B T C M C B T M M T M M M W C C B
Rearrange this information into a frequency distribution table using a tally column.
Code: Motorcycle (M), Car (C), Bus (B) and Truck (T)
Rearrange this information into a frequency distribution table using a tally column.
Q2. A particular class was surveyed to find out the number of pets per household and the data were recorded.
The raw data were: 0, 3, 1, 2, 0, 1, 0, 1, 2, 4, 0, 6, 1, 1, 0, 2, 2, 0, 1, 3, 0, 1, 2, 1, 1, 2.
a) Organise the data into a frequency distribution table.
Q3. If the tally is fairly simple, the frequency table may be simplified to two columns. Use the simpler table
at right to answer these questions.
Column graphs
A column graph is useful for comparing facts. The columns provide a visual display for
comparing quantities in different categories. Column graphs help us to see relationships
quickly.
When constructing column graphs, they should be drawn on graph paper and have:
1. a title or name
2. labelled axes which are clearly and evenly scaled
3. columns of the same width
4. an even gap between each column
5. the first column beginning half a unit (that is, half the column width) from the vertical axis.
Example1- The column graph below shows the results of a survey of people in a street asking them their
favourite car colour.
a) The scale of this graph‘s vertical axis is: 1 unit = ---------- people
b) What is the title of the graph?
c) What is represented by horizontal axis?
d) What is the most popular car colour?
e) How many more people preferred white than green?
f) Is the statement “Red is at least twice as popular as blue” true or false?
g) Choose the most correct: “Nobody chose silver as their favourite colour.”
The graph was incomplete. 18 people said that yellow was their favourite colour. Draw your own
column on the graph. Colour all columns.
h) How many people were surveyed?(included the people who chose yellow)
i) What fraction of people preferred white or red cars?
Q2. This column graph represents the Jumpin’ Jeans company’s profits.
a) Which year showed the highest profit? How much was it?
b) In which year did losses start? What was the loss that year?
c) What was the profit or loss for 2003?
d) i) Find the total profits and the total losses.
ii) Calculate the company’s overall profit/loss over the
period shown.
a) In which year was the only improvement made and by how much?
Q5. A survey of houses in Statistics Street produced the data shown in the table below.
a) Select a suitable title and draw a column graph to display the Number of bedrooms Number of housed
2 3
data. Label the vertical axis Number of houses and the horizontal
3 10
axis Number of bedrooms.
4 6
b) Find the most common number of bedrooms in the houses of 5 2
Statistics Street.
Line Graphs
Line graphs are used to display data or information that changes continuously over time. Line graphs allow
us to see overall trends such as an increase or decrease in data over time.
When constructing line graphs they must be drawn on graph paper and include:
1. a title
2. a horizontal axis that is evenly scaled and labelled (usually as time)
3. a vertical axis that is evenly scaled and labelled
4. a line or smooth curve that joins successive plotted points.
Example1. This table shows the variation in price of a share for 1 week in 2012. The prices were taken at the
close of trading at 4 pm.
a) Put time (day of the week) on the horizontal axis and price on the vertical axis.
Step 1: Mark an appropriate scale for the horizontal axis.
Step 2: Mark an appropriate scale for the vertical axis.
Step 3: Write a heading and label each axis.
Step 4: Plot the points and join them with straight lines.
b) Step 1: Locate the position ‘halfway through trading
on Tuesday’ on the horizontal axis.
Step 2: Rule a line up to the graph, then across to the
vertical axis.
Step 3: Read the value off the vertical axis: about 46 cents.
1. The height of a seedling was measured at the same time each day over a week.
iv) The first two coordinates to plot are (0, 4) and (1, 6). Write the remaining coordinates.
b) What was the initial height of the seedling? On what day was this?
c) Using the graph, estimates the height of the seedling after 5.5 days.
3 This graph shows the outside temperature over a 24-hour period that starts at midnight.
a) What was the temperature at midday?
d) Use the graph to estimate the temperature at these times of the day.
i 4:00 am
ii 9:00 am
iii 1:00 pm
iv 5:00 pm
4. Oliver measures his pet dog’s weight over the course of a year. He gets the following results.
a) Draw a line graph showing this information, making sure the vertical axis has an equal scale from 0 kg to
10 kg.
b) Describe any trends or patterns that you see.
c) Oliver put his dog on a weight loss diet for a period of 3 months. When do you think the dog started the
diet? Justify your answer.
Example: Interpreting a travel graph
This travel graph shows the distance travelled by a cyclist over 5 hours.
a) How far did the cyclist travel in total?
b) How far did the cyclist travel in the first hour?
c) What is happening in the second hour?
d) When is the cyclist travelling the fastest?
e) In the fifth hour, how far does the cyclist travel?
Solution:
a 30 km
b 15 km
c At rest
d In the first hour This is the steepest part of the graph.
e 5 km
2. This travel graph shows the distance travelled by a cyclist over 5 hours.
a) How far did the cyclist ride in total?
b) How far did the cyclist ride in the second hour?
c) During which hour did the cyclist ride the fastest?
d) For how long did the cyclist rest?
■ A frequency histogram is a graphical representation of a frequency distribution table. It can be used when
the items are numerical.
■ The vertical axis (y-axis) is used to represent the frequency of each item.
■ Columns are placed next to one another with no gaps in between.
■ A half-column-width space is placed between the vertical axis and the first column of the histogram.
■ A frequency polygon is formed by joining the centres of each column in the histogram. It begins and ends
on the horizontal axis.
Dot plots
A dot plot is a simple graphical way to present a small amount of data.
Each score in the data set is marked with a dot on a number line.
A dot plot is able to convey information more simply and clearly than a column graph.
It is especially suitable when there are a large number of categories to be displayed.
Example1- A group of movie critics are asked to give a new movie a rating of between 1 and 5 stars. The
results were 5, 3, 4, 1, 4, 5, 3, 2, 3, 4. Show this information on a dot plot.
Example2 – Over a 2-week period, the number of packets of potato chips sold from a vending machine each
day was recorded:
10, 8, 12, 11, 12, 18, 13, 11, 12, 11, 12, 12, 13, 14.
Q1. The number of goals scored by a soccer team over a season is given below.
0, 2, 3, 1, 2, 5, 4, 1, 2, 0, 2, 3, 1, 1, 1,
a) Display the data in a dot plot.
b) Comment on the distribution.
Q2. Draw a dot plot for each of the following sets of data:
a)2, 0, 5, 1, 3, 3, 2, 1, 2, 3
c) 49, 52, 60, 55, 57, 60, 52, 66, 49, 53, 61, 57, 66, 62, 64, 48, 51, 60.
Q3. Melanie played 22 competition basketball games last year. She threw these numbers of goals:
4, 5, 1, 2, 3, 0, 3, 9, 4, 6, 5, 4, 1, 1, 4, 4, 2, 5, 3, 1, 1, 0
a) Draw a dot plot representing this data.
b) Find the total number of goals she threw for the year.
c) Is there an outlier in this data?
Q4. Use the dot plot shown to complete the table.
Stem-and-leaf plots
Each piece of data in a stem plot is made up of two components: a stem and a leaf. For
example, the value 28 is made up of a tens component (the stem) and the units component (the
leaf) and would be written as:
= 28
Example1.Prepare an ordered stem-and-leaf plot for each of the following sets of data:
a) 129, 148, 137, 125, 148, 163, 152, 158, 172, 139, 168, 121, 134.
b) 1.6, 0.8, 0.7, 1.2, 1.9, 2.3, 2.8, 2.1, 1.6, 3.1, 2.9, 0.1, 4.3, 3.7, 2.6.
Q1. The following stem-and-leaf plot gives the age of members of a theatrical group.
Q3. Prepare an ordered stem-and-leaf plot for each of the following sets of data:
a) 1.2, 3.9, 5.8, 4.6, 4.1, 2.2, 2.8, 1.7, 5.4, 2.3, 1.9
b) 207, 205, 255, 190, 248, 248, 248, 237, 225, 239, 208, 244
c) 14.8, 15.2, 13.8, 13.0, 14.5, 16.2, 15.7, 14.7, 14.3, 15.6, 14.6, 13.9, 14.7, 15.1, 15.9, 13.9, 14.5
If a student is asked to describe how much time they spend each evening doing different activities, they could
present their results as either type of graph:
■ To calculate the size of each section of the graph, divide the value in a given category by the sum of all
category values. This gives the category’s proportion or fraction.
■ To draw a sector graph (also called a pie chart), multiply each category’s proportion or fraction by 360°
and draw a sector of that size.
■ To draw a divided bar graph, multiply each category’s proportion or fraction by the total width of the
rectangle and draw a rectangle of that size.
Example: Drawing a sector graph and a divided bar graph
On a particular Saturday, Sanjay measured the number of hours he spent on different activities.
TV Internet sport homework
1 hour 2hour 4 hours 3 hours
Activity based on matching types of data with different types of graphs- discuss????
Misleading Graphs
Example 1:
Solution: Graph B has exaggerated the increase in profit by not starting the scale on the vertical axis at zero
and by enlarging this scale.
Graph C has the opposite effect-diminishing the rate of increase by enlarging the horizontal scale.
Graph D, by using a smaller scale on the horizontal axis, gives a different impression again.
Graph E has an irregular scale on the vertical axis. Graph A, C and D are fair, although each gives a different
impression. Graph B and E are misleading.
Range
It is defined as the difference between the highest and lowest scores.
Range = Highest Score – Lowest score
Mode
The mode is simply the outcome that occurs the most often, it has the highest frequency.
Median
After a set of scores has been arranged in order, the median is the ‘middle score’. This is only strictly true
if there is an odd number of scores.
For an even number of scores, the median is the average of the middle two scores.
Mean
The mean or average of a set of scores is the sum of all the scores divided by the number of scores.
Total of Scores
Mean =
Number of Scores
Example1. Explain which statistical measure is referred to in these statements.
a) The majority of people surveyed prefer Activ-8 sports drink. - Mode
b) The ages of fans at the Rolling Stones concert varied from 8 to 80. - Range
c) The average Australian family has 2.1 children. - Mean
Q1. Explain which statistical measure is referred to in these statements.
a) There was a 15° temperature variation during the day.
b) Children at this school are absent 3.4 days per semester, on average.
c) Most often you have to pay $79.95 for those sports shoes.
d) The average Australian worker earns about $470 per week.
e) A middle-income family earns about $35 000 per annum.
Example2. A class of 20 students scored the following marks (out of 10) in a mathematics test:
5 1 7 6 7 9 8 7 6 3
2 3 5 3 5 4 7 9 7 2
Find
5+1+7+6+ 7+9+8+ 7+6+3+ 2+ 3+5+3+5+ 4+7 +9+7+ 2
Mean = = 5.3
20
Mode = 7
5+6
Median = = 5.5
2
Range = 9 -1 = 8
Q2. Find the range, mean, median and mode for these simple ordered data sets.
a)1, 2, 2, 2, 4, 4, 6 b)1, 4, 8, 8, 9, 10, 10, 10, 12, c)1, 5, 7, 7, 8, 10, 11
d) 3, 3, 6, 8, 10, 12 e) 7, 11, 14, 18, 20, 20 f)2, 2, 2, 4, 10, 10, 12, 14
Q3. For the given data sets, find the:
i) mean ii) median iii) mode iv) range
a) 5,2, 4,1, 0, 6, 1, 2, 9, 6 b) 1,7, 1, 3, 2,6, 1,5, 9,10
Example3. Elio’s batting scores in last year’s cricket series were 65, 30, 0, 0, 0, and 80; while Gaetano’s
scores were 0, 30, 30, 80, 25 and 20 in the same matches.
a) Calculate the mean score for each player.
b) Calculate the median score for each player.
c) Which of the mean and median is the better measure of each player’s ability?
Q4. Frank scored 5, 7, 6, 8, 7 in a series of spelling tests, while Erica scored 8, 8, 6, 1, 9 in the same tests.
a) Calculate the mean for each.
b) Find the median for each.
c) Which is the better measure of their abilities?
Q5. The following scores were made by four teams in sports matches.
Jackals: 4, 0, 5, 9, 4, 8
Panthers: 7, 10, 10, 11, 10, 9
Wallabies: 2, 15, 1, 17, 10, 3
Tigers: 9, 10, 20, 25, 0, 14
a) Which team has the highest mean?
b) Which team shows the greatest range of scores?
c) Compare modal scores for Jackals and Panthers.
d) Find the median score for each team.
Q6. The hours a shop assistant spends cleaning the store in eight successive weeks are:
8, 9, 12, 10, 10, 8, 5, 10
a) Calculate the mean for this set of data.
b) Determine the score that needs to be added to this data to make the mean equal to 10.
Q7. Decide if the following data sets are bimodal.
a) 2, 7, 9, 5, 6, 2, 8, 7, 4 b)1, 6, 2, 3, 3, 1, 5, 4, 1, 9 c)10, 15, 12, 11, 18, 13, 9, 16, 17
Q8.A netball player scored the following number of goals in her 10 most recent games:
15, 14, 16, 14, 15, 12, 16, 17, 16, 15
a) What is her mean score?
b) What number of goals does she need to score in the next game for the mean of her scores to be 16?
Q9. Write down a set of 5 numbers which has the following values:
a) Mean of 5, median of 6 and mode of 7
b) Mean of 5, median of 4 and mode of 8
c) Mean of 4, median of 4 and mode of 4
d) Mean of 4.5, median of 3 and mode of 2.5
e) Mean of 1, median of 0 and mode of 0
Q10.This dot plot shows the frequency of households with 0, 1, 2 or 3 pets.
a) How many households were surveyed?
b) Find the mean number of pets correct to one decimal place.
c) Find the median number of pets.
d) Find the mode.
e) Another household with 7 pets is added to the list. Does this change the median? Explain.
Q11. Eight numbers have a mean of 9. Seven of the numbers are 9, 7, 10, 6, 11, 6 and 10.
Find the eighth number.
Grouped Data
Example4. For set of scores, find the:
i) Mean
ii) Median
iii) Mode
iv) Range.
Q9. Find the mean, median, mode and range of these scores.
a) b)
c)
Example5. Find the median, mode and range of the data presented in the following stem-and-leaf plots.
Mode = 172
Range = 185 – 142 = 43
Q10. Find the median, mode and range of the data presented in the following stem-and-leaf plots.
Clusters, gaps and outliers
Example1. Identify any clusters, outliers or gaps in the following sets of data.
a) Monthly rainfall: 25 mm, 16 mm, 6 mm, 27 mm, 28 mm, 96 mm
96 mm is an outlier. It is much larger than all the other data values. (There is a large gap between 96 and the
other scores.)
b) c)
Q1. Identify any clusters, outliers or gaps in the following sets of data.
a) 13, 14, 15, 15, 17, 104
Example2. a) Find the mean, median, mode and range of each set of scores.
i) 3, 5, 5, 7, 9 ii) 3, 5, 5, 7, 90
29 110
Mean = = 5.8 Mean = = 22
5 5
Median = 5 Median = 5
Mode = 5 Mode = 5
Range = 6 Range = 87
b) Draw a dot plot for each set of data and mark the position of the mean, median and mode.
699
Mean = = 3.5
200
Example1. Consider the proportion of 6s in samples of size 5 from the previous results for die rolls. Compare
these with the population proportion.
Number of 6 s
Proportion of 6s in sample of 5 =
5
b)
Q2. Consider the proportion of 6s in some samples of size 20 for the data given in the introduction to this
section.
a) Complete the following table.
Q4. Combine the information from the 5 samples in question 2 into one of size 100. Is the proportion of 6s in
this sample a good estimate of the population proportion?
Example3. Consider the means of samples of size 5 taken from the data at the beginning of Section I and
compare this with the population mean.
a) Complete the table.