0% found this document useful (0 votes)
50 views

Assignment3 Zhao Zihui

1. The document provides instructions for setting up and submitting an assignment on time series forecasting using R and Quarto. It discusses installing necessary software, organizing files, and submitting assignments. 2. The assignment asks students to fit Holt-Winters exponential smoothing models to a time series dataset, check model residuals for stationarity, perform time series cross-validation, and select the best model based on forecast accuracy. 3. Additional questions explore properties of moving averages, differences, and the relationship between population and sample autocorrelation functions for stochastic processes. Students are asked to derive properties, perform simulations, and explain results.

Uploaded by

zhaozhaozizizi2
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Assignment3 Zhao Zihui

1. The document provides instructions for setting up and submitting an assignment on time series forecasting using R and Quarto. It discusses installing necessary software, organizing files, and submitting assignments. 2. The assignment asks students to fit Holt-Winters exponential smoothing models to a time series dataset, check model residuals for stationarity, perform time series cross-validation, and select the best model based on forecast accuracy. 3. Additional questions explore properties of moving averages, differences, and the relationship between population and sample autocorrelation functions for stochastic processes. Students are asked to derive properties, perform simulations, and explain results.

Uploaded by

zhaozhaozizizi2
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ST5209/X Assignment 3

ZHAO ZIHUI A0273946N

Set up

1. Make sure you have the following installed on your system: LATEX, R4.2.2+, RStudio
2023.12+, and Quarto 1.3.450+.
2. Pull changes from the course repo.
3. Create a separate folder in the root directory of the repo, label it with your name,
e.g. yanshuo-assignments
4. Copy the assignment1.qmd file over to this directory.
5. Modify the duplicated document with your solutions, writing all R code as code chunks.
6. When running code, make sure your working directory is set to be the folder with your
assignment .qmd file, e.g. yanshuo-assignments. This is to ensure that all file paths
are valid.1

Submission

1. Render the document to get a .pdf printout.


2. Submit both the .qmd and .pdf files to Canvas.

1. Holt-Winters, residuals, and forecast accuracy

Consider the antidiabetic drug sales time series which can be loaded using the following code
snippet.

diabetes <- read_rds("D:/st5209/assignment3/diabetes.rds") |>


select(TotalC)
diabetes

1
You may view and set the working directory using getwd() and setwd().

1
# A tsibble: 204 x 2 [1M]
TotalC Month
<dbl> <mth>
1 3526591 1991 7�
2 3180891 1991 8�
3 3252221 1991 9�
4 3611003 1991 10�
5 3565869 1991 11�
6 4306371 1991 12�
7 5088335 1992 1�
8 2814520 1992 2�
9 2985811 1992 3�
10 3204780 1992 4�
# i 194 more rows

a. Fit the following exponential smoothing models on the entire time series:

• Holt-Winters with multiplicative noise and seasonality,


• Holt-Winters with a log transformation, with additive noise and seasonality,
• Holt-Winters with multiplicative noise and seasonality, and damping.
dfit <- diabetes |>
model(
etsmtplns=ETS(TotalC~error("M")+trend("N")+season("M")),
etslogtr=ETS(log(TotalC)~error("A")+trend("N")+season("A")),
etsmtplnsdamp=ETS(TotalC~error("M")+trend("Ad")+season("M"))
)

b. Make ACF plots for the innovation residuals of these three models. What can you say
about stationarity of the residuals from the plot?
Answer: From the plot, we see that there are 7, 6 and 4 spikes in each model
that lie outside of the confidence band for white noise. Thus, all three models
are not stationary.

dfit |>
augment() |>
ACF(.innov) |>
autoplot()

2
0.2
0.1

etslogtr
0.0
−0.1
−0.2
−0.3
0.2
0.1

etsmtplns
0.0
acf

−0.1
−0.2
−0.3
0.2

etsmtplnsdamp
0.1
0.0
−0.1
−0.2
−0.3
6 12 18
lag [1M]

c. Calculate the p-value from a Ljung-Box test on the residuals with lag ℎ = 8. What can
you say about the stationarity of the residuals from the p-value? What does this mean
about the model?
Answer: From the result, we can find that the p-value of the three models
are all less than 0.05, which means that we should reject the null hypothesis
and the remainder components are not white noise.

dfit |>
augment() |>
features(.innov,ljung_box,lag = 8)

# A tibble: 3 x 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 etslogtr 32.4 0.0000780
2 etsmtplns 18.3 0.0188
3 etsmtplnsdamp 19.2 0.0141

d. Perform time series cross-validation for the three methods, using .init = 50 and .step
= 10, and with the forecast horizon ℎ = 4. Which method has the best RMSSE? How
many data points is the error averaged over in total?
Answer: The model, Holt-Winters with a log transformation, with additive
noise and seasonality, has the best RMSSE, which is 0.89. From the setting

3
of the initial window size, 50, incremental step, 10, and the number of the
data points in diabetes, 204, we calculate that there are (204 − 50)/10 + 1 �
16 splits in total.

dcro <- diabetes |> stretch_tsibble(.step = 10, .init = 50)


dfit <- dcro |>
model(
etsmtplns=ETS(TotalC~error("M")+trend("N")+season("M")),
etslogtr=ETS(log(TotalC)~error("A")+trend("N")+season("A")),
etsmtplnsdamp=ETS(TotalC~error("M")+trend("Ad",phi=0.8)+season("M")))
dfit |> forecast(h=4) |>
group_by(.id) |>
mutate(h=(row_number()-1)%%4 + 1) |>
ungroup() |>
filter(h==4) |>
as_fable(response = "TotalC",distribution=TotalC)|>
accuracy(diabetes) |>
select(.model, RMSSE, MASE) |>
arrange(RMSSE)

# A tibble: 3 x 3
.model RMSSE MASE
<chr> <dbl> <dbl>
1 etslogtr 0.892 0.791
2 etsmtplns 0.937 0.823
3 etsmtplnsdamp 1.05 1.00

2. Moving averages and differences

Consider the linear trend model

𝑋𝑡 = 𝛽 0 + 𝛽 1 𝑡 + 𝑊 𝑡 .

Define a time series (𝑌𝑡 ) by taking a moving average of (𝑋𝑡 ) with a symmetric window of size
7. Define another times series (𝑍𝑡 ) by taking a difference of (𝑋𝑡 ).
a. What is the mean function for (𝑌𝑡 )? What is the ACVF for (𝑌𝑡 )?
b. What is the mean function for (𝑍𝑡 )? What is its ACVF?
c. What is the CCF of (𝑌𝑡 ) and (𝑍𝑡 )?
d. Are (𝑌𝑡 ) and (𝑍𝑡 ) jointly stationary?

4
Answer:

5
3. Sample vs population ACF

Consider the signal plus noise model

𝑋𝑡 = sin(2𝜋𝑡/5) + 𝑊𝑡 .

a. What is the ACF of (𝑋𝑡 )?


Answer:

6
b. Simulate a time series 𝑋1 , 𝑋2 , … , 𝑋200 from this model and plot its sample ACF.
set.seed(5209)
plot1<- tibble(
t = seq(1,200,1),
wn = rnorm(200),
xt = sin(2*pi*t/5)+wn) |>
as_tsibble(index = t)
plot1 |>
ACF(xt) |>
autoplot()

7
0.3

0.2

0.1
acf

0.0

−0.1

−0.2

−0.3
5 10 15 20
lag [1]

a. Why does the sample ACF not look like the population ACF function?
Answer:

a. Why does the asymptotic normality theorem for the ACF not apply?

Answer: Time series is not stationary since 𝐸(𝑋𝑡 ) is dependent with t, so asymp-
totic normality theorems like WLLN and CLT don’t hold.

You might also like