2025 Data Notes.docx
2025 Data Notes.docx
2025
Area of Study 1
Data Analysis
1
Chapter 1:
1A Types of Data
Categorical Data
• Categorical variables classify
or name a quality or numerical Data
attribute– for example, a • Numerical data have data values which are
person's eye colour, study quantities, generally arising from counting
mode, or fitness level. or measuring.
• Nominal variables have data • Discrete are those which may take on
values that are simply names. only a countable number of distinct
• Ordinal variables have data values such as 0, 1, 2, 3, 4
values that can be used to • Continuous data take an infinite
both name and order. number of possible values and are
often associated with measuring.
Deciding …..
•
Example 1 Numerical data can always be
used to perform arithmetic
Classify the following data as nominal, ordinal, discrete or computations such as finding the
average.
continuous. • The way the data are recorded
a. The number of chocolate chips in each of 10 cookies can determine its type. Ie. If the
is counted. variable weight is recorded in
b. The time taken for 20 students to complete a puzzle is kilograms, its numerical, if
recorded in seconds. However, if the data are
c. Member of a football club were asked to rate how they recorded as 'underweight',
felt about t 'normal weight', 'overweight', its
d. he current coach, 1= Very satisfied, 2 = Satisfied, 3 = Indifferent, categorical
4 = Dissatisfied, 5 = Very dissatisfied.
e. Students are asked to each choose their preferred colour from the
list 1= Blue, 2 = Green, 3 = Red, 4 = Yellow.
f. Students weights were classified as 'less than 60kg', '60kg - 80kg'
or 'more than 80kg'.
2
3
1B Displaying and Describing Categorial Data
4
Example 1
A group of 11 preschool children were asked to choose between chocolate and vanilla ice-cream (C =chocolate, V =
vanilla):
CVVCCVCCCVV
Example 2
Constructing a percentage segmented bar chart from a frequency table.
The climate type of 23 countries is classified as 'cold', 'mild' or 'hot'. Construct a percentage frequency segmented
bar chart to display this information.
5
6
1C Displaying and describing Numerical Data
Example 1
Constructing a frequency table for discrete numerical data
Grouped Frequency table
taking a small number of values.
• Every data value should be in an
The number of bedrooms in each of the 24 properties interval.
• The intervals should not overlap.
sold in a certain area over a one month period are as follows:
• There should be no gaps between the
2 3 4 3 3 4 intervals.
• A division which results in about 5 to
3 4 4 1 3 2 15 groups,
1 2 2 2 4 5 • Choose an interval width that is easy
to interpret
3 4 4 5 3 4 • intervals of 0–49, 50–99, 100–149
would be preferred over the intervals
Construct a table for these data showing both frequency and 1–50, 51–100, 101–150
percentage frequency, to one decimal place.
Example 2
The data below give the average hours worked per week in 23 countries. Construct a grouped frequency table with
five intervals.
Example 3
The data below give the average hours worked per week in 23 countries. Construct a grouped frequency table with
five intervals.
35.0 48.0 45.0 43.0 38.2 50.0 39.8 40.7 40.0 50.0 35.4 38.8
40.2 45.0 45.0 40.0 43.0 48.8 43.3 53.1 35.6 44.1 34.8
7
Constructing a Histogram
From a frequency table
Example 1 frequency (count or per cent) is
shown on the vertical axis
Construct a histogram from a frequency table
•the values of the variable being
displayed are plotted on the
Average Frequency
hours horizontal axis
worked Number % •each bar in a histogram corresponds
to a data interval
30.0−34.9 1 4.3
35.0−39.9 6 26.1
•the height of the bar gives the
frequency (or the percentage
40.0−44.9 8 34.8 frequency).
45.0−49.9 5 21.7 From raw data
50.0−54.9 3 13.0 • Create a grouped frequency table
Total 23 99.9 first.
Example 2
CAS Tips
Construct a histogram from a frequency table
50.0–54.9 3
Total 23
8
Analysing a Histogram
The purpose of constructing a histogram is to help understand the key features of the data distribution. These
features are:
Shape
Symmetrical
Bimodal
Positive skew
Negative Skew
Centre
Middle
Spread
Wide
Narrow
Outliers
Extreme values
Example 1
The histogram shows the gestation period (completed weeks) for a sample for 1000 babies born in Australia one
year. Describe this histogram in terms of, centre, shape spread and outliers in order.
Analysing a histogram
Write a paragraph including all 4
features.
9
10
1D Dot Plots and Stem Plots Constructing a dot plot
.
• Constructing a Stem Plot
22 3 37 17 55 30 1 • Include a key
• Ensure numbers are spaced out
evenly.
• Splitting the stems is useful
when there are only a few
different values for the stem.
Example 3
2 12 13 9 18
17 7 16 12 10
16 14 11 15 16
15 17
11
12
1E Using a logarithmic (base 10) scale to display data
Many numerical variables that we deal with in statistics have values that range over several orders of magnitude or
very small and large numbers that need to be represented on the same scale.
numbers 0.01 0.1 1 10 100 1000 10 000 100 000 1 000 000
powers 10−2 10−1 100 101 102 103 104 105 106
logs -2 -1 0 1 2 3 4 5 6
Logarithmic Transformation
Example 1 Write the number 100 as a power of 10, and then write down its logarithm.
• logs
a. 1 b. 10
number 1 10 100
indices 100 101 102
logs 0 1 2
Example 2
a.Use your CAS to find the logs of the following numbers to 1 d.p
a. 45 b. 245 c. 3546
CAS Tips
b.Find the number whose logarithm are as follows to one decimal place. FROM Number TO Log
𝑙𝑜𝑔10 (45)𝑒𝑛𝑡𝑒𝑟
a. 3.1876 b. 2.8517 c. 4.6531 FROM Log TO Number
101.65321 𝑒𝑛𝑡𝑒𝑟
13
Example 3
CAS Tips
The histogram shows the distribution of the • Add List and Spreadsheets
• Enter x into columns A
weights of 27 animal species plotted on a log scale. • Name column B as ‘logs’
• Directly below type ‘=
𝑙𝑜𝑔10 (𝑥)’ into template
• Add Data and Statistics
• Click to add variable
• Plot Properties>Histogram
Properties>Bin Settings>Equal
Bin Width and set the column
width (bin) to 1 and alignment
(start point) to −2 and use
menu>Window/Zoom>Zoom-
Data ,to rescale.
a. What body weight (in kg) is represented by the number 4 on the log scale?
b. How many of these animals have body weights more than 10 000 kg?
c. The weight of a cat is 3.3 kg. Use your calculator to determine the log of its weight correct to two significant figures.
d. Determine the weight (in kg) of the animal with a log(body weight) of 3.4 (the elephant). Write your answer correct
14
15
1F Measures of centre and spread
The mean The Median The Range The Interquartile range Standard deviation
The median
Example 1 • Measures centre and
used when data is
Order each of the following data sets, locate the median, and then write skewed.
down its value. • 𝑛+1
2
𝑛
(odd) 2 + 1 (𝑒𝑣𝑒𝑛)
a. 2 9 1 8 3 5 3 8 1 • Middle value in an
b. 10 1 3 4 8 6 10 1 2 9 ordered data set
𝑛+1
The mean
2
Measures centre
mean
Example 2 𝑠𝑢𝑚 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
=
𝑇𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
Finding the median value from a dot plot
The dot plot opposite displays the age distribution (in years) of the
Example 3
The IQR
The histogram shows the average number of hours per week a group • Measures spread of
the middle 50% of
of 23 people spent on the internet. Find possible values for the data
median and quartiles of this distribution. • Used if outliers
present
• 𝐼𝑄𝑅 = 𝑄3 − 𝑄1
∑(𝑥 − 𝑥)2
𝑆. 𝐷. =
𝑛−1
16
Example 4
The stem plot displays the maximum temperature (in ∘C) for 12 days in January. Determine the median maximum
temperature for these 12 days.
Example 5
Example 6
Find the interquartile range of the weights of the 18 cats whose weights are displayed in the ordered stem plot
below. (n = even)
1 | 2 represents 1.2 kg
Example 7
The stem plot shows the life expectancy (in years) for 23 countries. Find the IQR for life expectancies.
Stem: 5|2=52 years
17
Example 8
Calculating the mean from the formula. The following is a set of reaction times (in milliseconds):
38 36 35 43 46 64 48 25 CAS TIPS
Write down the values of the following, correct to one decimal place. • Add List and Spreadsheets
• Enter x into columns A
a. 𝑛 b. ∑𝑥 c. 𝑥 • Enter data
• Menu Stats
• Stats Calculations
• One variable statistics
Symbols
𝑥 = mean
𝑠𝑥 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑑𝑒𝑣𝑖𝑎𝑡𝑜𝑛
Q2= median
𝑛 = number of values in the
set
18
19
1G The Five Number Summary and the Box Plot
Five-number summary
• minimum,
• Q1,
• 𝑸𝟐 𝒐𝒓 M,
• Q3,
•
• maximum
Example 1
The stem plot shows the distribution of life expectancies (in years) in 23 countries. Construct and analyse the
boxplot.
20
Analsying Box Plots with outliers
Example 2
a. the median
b. the quartiles Q1 and Q3
c. the interquartile range (IQR)
d. the minimum and maximum values
e. the values of any possible outliers
f. the smallest value in the upper end of the data set that will be classified as an outlier
g. the largest value in the lower end of the data set that will be classified as an outlier.
Example 3
Example 4
21
Example 5 Describe the distributions
represented by the boxplot.
Example 6
22
23
1H The normal distribution and the 68–95–99.7% rule
68%-95%99.7% Rule
68% (x¯−s,x¯+sx¯−s,x¯+s).
95% (x¯−2s,x¯+2sx¯−2s,x¯+2s).
99.7% (x¯−3s,x¯+3sx¯−3s,x¯+3s).
STANDARD SCORES
𝑥−𝑥
𝑍=
𝑠
• Meaning of z – score
• a positive z-score = above the
mean
Example 1 • a z-score of zero = is equal to
the mean
The heights of a group of young women have a mean of 𝑥=160 cm and
a. standard deviation of s=8cm. Determine the standard or z-scores of a • a negative z-score = below
woman who is: the mean.
a.172 cm tall b.150 cm tall c.160 cm tall.
Example 2
c. Another student studying the same two subjects obtained a mark of 55 for both Psychology and Statistics.
Does this mean that she performed equally well in both subjects?
24
Example 3
Suppose the weight of a certain species of bird is normally distributed with a mean of 42 grams with a standard
deviation of 3 grams. If a bird selected at random from this population has a standardised weight of z=−1, what
percentage of birds in this population weigh more than this bird? Approximately what percentage of birds would
weigh between 39 and 48 grams?
Example 4
A class test (out of 50) has a mean mark of 𝑥 = 34 and a standard deviation of 𝑠 = 4 Joe's standardised test mark
was 𝑧 = −1.5. What was Joe's actual mark?
Example 5
Suppose the heights of red flowering gum trees have a mean of 10.2 metres, and 2.5% of these trees grow to more
than 11.4 metres tall. Assuming that the heights of these trees are approximately normally distributed, what is the
standard deviation of the height of the red flowering gum trees?
Example 6
The marks scored in an examination are known to be approximately normally distributed. If 16% of students score
more than 80 marks, and 2.5% of students score less than 20 marks, estimate the mean and standard deviation of
this distribution.
25
26
Chapter 2: Investigating associations between two variables
2A Bivariate data - Classifying the variables
Bivariate data
Example 1 • Where the two variables
are linked in some way
For each of the following questions, determine if they involve investigating associations(associated)
between so that they
one numerical variable and one categorical variable or vary together, thus
two categorical variables or two numerical variables. bivariate
• One of the variables as the
a. Are younger people (age measured in years) more likely explanatory variable. The
to believe in astrology (measured as 'yes' or 'no') than older other variable is the
people? response variable. We use
the explanatory variable to
b. Do people who weigh more (weight measured in kg) tend explain changes that might
to have higher blood pressure (blood pressure measured be observed in the
in mmHg)? response variable.
Example 2
We wish to investigate the question, 'Does the time it takes a student to get to school depend on their mode of
transport?' The variables here are time and mode of transport. Which is the response variable (RV) and which is the
explanatory variable (EV)?
Example 3
a. We wish to investigate the question. Can we predict people's height (in cm) from their wrist measurement?
The variables in this investigation are height and wrist measurement. Which is the response variable (RV)
and which is the explanatory variable (EV)?
Framing the question
The way the question is framed
determines the EV and RV.
b. We wish to investigate the question 'Can we predict
people's wrist measurement from their height?'
Which is the response variable (RV) and which is t
he explanatory variable (EV)?
27
28
2B Investigating associations between categorical variables
Constructing a two-way frequency table
Question – Does support for gun control depends on where a person lives.
Residence
Country City
For
Against
total
Yes, a relationship exist. In this sample of 100 people, a higher percentage of city people
were for gun control than country people: 71.4% to 55.2%. This indicates that a person's
attitude to gun control is associated with their place of residence.
29
Example 1
University
male female
yes
no
total
University
male female
yes
no
total
University
male female
yes
no
total
30
Example 2
Two-way frequency tables for categorical variables taking more than two values
Example 3
Example 4
31
The segmented bar chart-from a two-way frequency table
Segmented bar chart
• A visual display is a segmented
bar chart.
• consists of separate bars for
each value of the EV,
• each bar separated into parts
(segments)
• it shows the percentage for
each value of the response
variable.
Example 5
Example 6
32
33
34
2C Investigating the association between a numerical and a categorical
variable
The parallel dot plots below display the distribution of the number of ✓ Parallel dot plots,
sit-ups performed by 15 people before and after they had completed ✓ Back-to-back stem plots or
a gym program. Do the parallel dot plots support the contention that ✓ The parallel boxplots.
the number of sit-ups performed is associated with completing the
Describe the distribution
gym program? Write a brief explanation that compares the
distributions. • shape
• centre
• spread
• outliers
Analyse the distribution
• Medians
• Mean
• IQR
• Range
• Standard deviation
Note:
Comparative Report
• Comparative language,
higher, lower, more, less,
greater, smaller
35
Example 2
The back-to-back stem plot below displays the distribution of life expectancy (in years) for the same 13 countries in
1970 and 2010. Do the back-to-back stem plots support the contention that life expectancy has changed between
these two time periods?
36
Example 3
Use the following parallel boxplots to compare the pulse rates (in beats/minute) for a group of 70 male
students and 90 female students.
37
Example 4
Use the parallel boxplots below to compare the salary distribution for workers in a certain industry across four
different age groups: 20–29 years, 30–39 years, 40–49 years and 50–65 years.
38
39
2D Investigating associations between two numerical variables
Scatterplots
Example 1
We wish to investigate the association between university participation • A scatterplot - when both
rate (the EV) and average hours worked (the RV) in nine countries. of the variables are
The starting point for this investigation is again a graphical display numerical.
of the data. Here our options are to construct a scatterplot. • each point represents a
The data for 9 countries are shown below. single case.
• the vertical or y-axis for
the response variable (RV)
• the horizontal or x-
axis for the explanatory
variable (EV).
CAS TIPS
• Add List and Spreadsheets
• Label column A with EV
• Label column B with RV
• Add Data and statistics
• Click to add variable
• Create the scatterplot
40
41
2E How to interpret a scatterplot- Describing Association
• Direction (positive or negative)
• Strength, (strong, moderate, weak)
• Form-Linear or non-linear
Association Form
Linear - A scatterplot is said to have a linear form
Positive association when the value of the response when the points tend to follow a straight line.
variable tends to increase as the value of the
explanatory variable increases. Non- linear A scatterplot is said to have a non-
linear form when the points tend to follow a
curved line.
42
Example 1 Example 2
Classify the association and form Classify association and direction
Example 1
CAS TIPS
Classify the strength of each of the following linear
associations using the previous table: • Add list and spreadsheets
a. 𝑟 = 0.35 • Menu, stats, stats calculations,
b. 𝑟 = −0.507 • Look for r in the list of statistics.
c. 𝑟 = 0.992
d. 𝑟 = −0.159
43
2G The coefficient of determination
Coefficient of Determination
Example 1
Degree of Prediction
If the correlation between weight and height is 𝑟 = 0.8, The degree to which one variable can be
find the value of the coefficient of determination. predicted from another linearly related
Express your answer as a percentage. variable is given by a statistic called
the coefficient of determination.
Calculating r2,
By squaring r, expressed as a %.
Example 2 Interpreting
The coefficient of determination (as a
It is found the coefficient of determination between
percentage) tells us the variation in the
height and weight to be 0.64 (or 64%). Interpret this
response variable that is explained by the
value in terms of the variables weight and height. variation in the explanatory variable.
Example 3
Example 4
Example 5
44
45
2H Correlation and causality
Correlation does not imply causality.
• A correlation tells you about the strength of the association between the variables, but no more. It tells you
nothing about the source or cause of the association.
• Common response, Confounding factors, coincidence are all possible and often further investigation needs to
be undertaken.
2I Which graph?
46
Chapter 3: Investigating and modelling linear associations
47
Example 1
The height and weight of 11 people have been recorded, and the values of the following statistics determined.
Calculate the values of the slope and intercept rounded to two significant figures.
height weight
Example 2
Use the following information to find the value of the correlation coefficient r, rounded to three significant figures.
hours studied exam score
48
49
3B Using the least squares regression line to model a relationship
We wish to investigate the nature of the association between the price of a second-hand car and its age.
The ultimate aim is to find a mathematical model that will enable the price of a second-hand car to be
predicted from its age.
Regression equation
𝑝𝑟𝑖𝑐𝑒 = 35100 − 3940 × 𝑎𝑔𝑒
Y variable = price
x-variable = age
a-=35100
b=-3940
r=−0.9643
r2=0.9299
Interpretation:
50
Predicting using the equation
As a general rule, a regression equation only applies to the range of values of the explanatory variables used to
determine the equation.
Predicting within the range of values of the explanatory variable is called interpolation. Interpolation is generally
considered to give a reliable prediction.
Predicting outside range of values of the explanatory variable is called extrapolation. Extrapolation is generally
considered to give an unreliable prediction.
Example 1
The equation of a regression line that enables the price of a second-hand car
to be predicted from its age is: 𝑝𝑟𝑖𝑐𝑒 = 35100 − 3940 × 𝑎𝑔𝑒.
a. Use this equation to predict the price of a car that is 5.5 years old.
State the reliability of the prediction.
b. Use this equation to predict the price of a car that is 2 years old.
State the reliability of the prediction.
c. Use this equation to predict the price of a car that is 10 years old.
State the reliability of the prediction.
d. Use this equation to predict the price of a car that is 12 years old. State the reliability of the prediction.
e. Find the Correlation of determination and explain its meaning.
51
Residual plot-further analysis
A residual plot is a graph of the residuals (plotted on the vertical axis) against the explanatory variable (plotted on
the horizontal axis), where:
Remember, the residual value informs us of the distance of individual data values away from the regression line.
However, sometimes the scatterplot is not sensitive enough to reveal the non-linear structure of a relationship. To
gain more information we need to investigate the fit of the regression line to the data, and we do this using a
residual plot. The residual plot is used to check the linearity assumption required for a linear regression.
Example 1 continued
Calculating a residual
Calculate the residual when its price is predicted using the regression equation and mark it on the graph.
Steps
52
Example 2
Which of the following residual plots would call into question the assumption of linearity in a regression analysis?
Give reasons for your answers.
Example 3
Construct a report to describe the association between the price and age of second-hand cars.
• Association
• Strength
• Form
• Direction
• r and r2
• Equation
• Slope
• Y-intercept
• Residual Plot
• Conclusion
53
Example 4
The table below shows the scores obtained by nine students on two tests. We want to be able to predict test B
scores from test A scores. Preform a Regression Analysis
54
55
CHAPTER 4 DATA TRANSORMATIONS
Data transformation is a process through which the data is linearised. The circle of transformation is an easy way to
determine which are the best transformation to apply depending on the type of curve. Possible transformations to
the y and x axis are
Once the curve is identified all three possible transformations are applied and the r 2 is identified. Based on the
highest r2, and the residual plot, the chosen equation is changed to reflect the type of transformation applied,
equations are compared to the original and predictions are made.
56
4A The squared transformation
Example 1
A base jumper leaps from the top of a cliff, 1560 metres above the valley
floor. The scatterplot below shows the height (in metres) of the base
jumper above the valley floor every second, for the first 10 seconds of the
jump.
Use the least squares equation to predict to the nearest metre the
height of the base jumper after 3.4 seconds.
Example 2
Apply a squared transformation to the variable yield, and determine the least
squares regression line for the transformed data.
Use the least squares equation to predict the yield of a plant given
57
58
4B Logarithmic transformation
Example 1
Apply a log transformation to the variable GDP, and determine the least
squares regression line for the transformed data.
Use the least squares equation to predict the lifespan of a country with a
GDP of $20 000 per person, giving your answer rounded to one decimal place.
Example 2
The scatterplot shows that there is a strong positive association between the
number of case and day.
Apply a log transformation to the variable cases, and determine the least
squares regression line for the transformed data.
Use the least squares equation to predict the cases on day 13.
59
60
4C The reciprocal transformation
Example 1
After embarking on a new healthy eating and exercise plan, Ben recorded
his weekly weight loss over a 10 week. The association is not linear, as can
be seen in the scatterplot below which plots weekly weight loss in kg
against length of diet in weeks.
Example 2
sticky label which is 5 cm long, giving your answer to two decimal places.
61
Example 3
The scatterplot shows the age (in years) and diameter at a height of
1.5 metre (in cm) for a sample of 19 trees of the same species. Use an
appropriate transformation to find a regression model which allows
the age of this species of tree to be predicted from its diameter.
62
63
5A Time series data
Key features of a time series
Exercise 1
graphs
Maximum temperature was recorded each day for a week
The features we look for in a time series
in a certain town. Construct a time series plot of the data.
are:
• trend
Day M T W T F S S • cycles
Temp (∘C) 20 21 25 36 34 25 26 • seasonality
• structural change
• possible outliers
Exercise 2 • irregular (random) fluctuations.
64
Example 4
Example 5
The time series plot below shows the power bill for a rental house
(in kWh) for the 12 months of a year. Comment on any structural
change in the plot.
Example 6
The time series plot below shows the daily power bill for a house
(in kWh) for a fortnight. Comment on any outliers in the plot.
65
66
67
5B Smoothing a time series using moving means
Example 1
Three- and five-moving mean smoothing of a time series Smoothing is a process which involves
The following table gives the number of births per month over a replacing individual data points with the
calendar year in a country hospital. Use the three-moving mean mean of the data point and some adjacent
and the five-moving mean methods, rounded to one decimal place, points. This allows for trends in the data to
to complete the table. be observed more clearly as the
presence of irregular fluctuations
seasonality or cycles may obscure
underlying trends.
Example 2
The table below gives the temperature (∘C) recorded at a weather
station at 9.00 a.m. each day for a week. Calculate the three and
five moving mean smoothed temperature for Tuesday.
Month M T W T F S S
Raw
3-mean
5 mean
68
Example 3
Two-moving mean smoothing with centring Soothing by finding the mean
The temperatures (∘C) recorded at a weather station at 9 a.m. each where
day for a week are displayed in the table. Calculate the two-moving 𝑛 = 𝑒𝑣𝑒𝑛
mean smoothed temperature with centring for Tuesday. Two-moving mean
1. Locate data value to be
smoothed.
2. Identify 3 values around the
data point.
3. Find the mean of the first two
values.
4. Find the mean of the last two
values.
5. Find the mean of the
Example 4 answers.
Four-moving mean
Four- and six-moving mean smoothing with centring 1. Locate data value to be
The table below gives the temperature (°C) recorded at a weather
smoothed.
station at 9.00 a.m. each day for a week. Calculate the four and six
smoothed temperature with centring for Thursday. 2. Identify 5 values around the
data point.
3. Find the mean of the first four
Month J F M A M J J A S O N D values.
Births 10 12 6 5 22 18 13 7 9 10 8 5 4. Find the mean of the last four
values.
5. Find the mean of the
answers.
Six-moving mean
1. Locate data value to be
smoothed.
2. Identify 7 values around the
data point.
3. Find the mean of the first
seven values.
4. Find the mean of the last
seven values.
5. Find the mean of the
answers.
69
70
5C Smoothing a time series plot using moving medians
71
72
5D Seasonal indices • Remember, Seasonality is a characteristic of a
time series in which the data experiences regular
Example 1 and predictable fluctuations or patterns that
recur every calendar year.
Interpreting seasonal indices
Suppose that the seasonal indices (SI) for electricity • When the data is considered to have a seasonal
usage in Esse's home are as shown in the table: component, it has noticeable peaks and troughs
and it is often necessary to remove these so any
What does the seasonal index for Winter tell us?
underlying trend is clearer. The Seasonal Index is
What does the seasonal index for Spring tell us? a measure to the extent of the seasonal
component as a comparison to the average for
the season.
• The process of removing the seasonal component
is called deseasonalising the data or
to seasonally adjust the data for the purpose of
clarity in identifying trends.
• We can use seasonal indices to remove or add
the seasonal component, deseasonalise or
Example 2 reseasonalise.
• To deseasonalise the data we need to calculate
Using seasonal indices seasonal indices which tells us how a particular
The seasonal indices (SI) for cold drink sales for Imogen's season (generally a day, month or quarter)
kiosk are as shown in the table. If the actual cold drink sales compares to the average season.
last summer totalled $21 653, what is the deseasonalised • Seasonal indices are calculated so that their
sales figure for that time period? If the deseasonalised cold average is 1 or 100%. This means that the sum of
drink sales last spring totalled $10 870, what were the the seasonal indices equals the number of
actual sales for that time period? seasons. E.g if the seasons are months, the
seasonal indices add to 12. If the seasons are
quarters, then the seasonal indices add to 4.
• Re-seasonalising data
𝑎𝑐𝑡𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 = 𝑑𝑣 × 𝑠𝑖
73
• The percentage change is calculate by
100
%= − 100
𝑆𝑖
place.
Example 4 Calculating SI
Example 5
Calculating seasonal indices (several years' data)
Suppose that Mikki has 3 years of data, as shown.
Calculate seasonal indices, rounded to two decimal
places.
Example 6
De-seasonalising a time series
The quarterly sales figures for Mikki's shop over a 3-year
period are as shown. Use the seasonal indices shown to
de- seasonalise these sales figures.
Summer Autumn Winter Spring Write answers rounded to the nearest
whole number.
1.16 0.94 1.26 0.64
Summer Autumn Winter Spring
1.16 0.94 1.26 0.64
74
75
5E Fitting a trend line and forecasting
Example 1
The table below shows the number of female students in Victoria enrolled in at least one subject in the
Mathematics learning area at year 12 over the period 2010–18.
Example 2
Fitting a trend line (seasonality)
The deseasonalised quarterly sales data from Mikki's shop are shown below.
76
End of Notes
77