Assignment3 Zhao Zihui
Assignment3 Zhao Zihui
Set up
1. Make sure you have the following installed on your system: LATEX, R4.2.2+, RStudio
2023.12+, and Quarto 1.3.450+.
2. Pull changes from the course repo.
3. Create a separate folder in the root directory of the repo, label it with your name,
e.g. yanshuo-assignments
4. Copy the assignment1.qmd file over to this directory.
5. Modify the duplicated document with your solutions, writing all R code as code chunks.
6. When running code, make sure your working directory is set to be the folder with your
assignment .qmd file, e.g. yanshuo-assignments. This is to ensure that all file paths
are valid.1
Submission
Consider the antidiabetic drug sales time series which can be loaded using the following code
snippet.
1
You may view and set the working directory using getwd() and setwd().
1
# A tsibble: 204 x 2 [1M]
TotalC Month
<dbl> <mth>
1 3526591 1991 7�
2 3180891 1991 8�
3 3252221 1991 9�
4 3611003 1991 10�
5 3565869 1991 11�
6 4306371 1991 12�
7 5088335 1992 1�
8 2814520 1992 2�
9 2985811 1992 3�
10 3204780 1992 4�
# i 194 more rows
a. Fit the following exponential smoothing models on the entire time series:
b. Make ACF plots for the innovation residuals of these three models. What can you say
about stationarity of the residuals from the plot?
Answer: From the plot, we see that there are 7, 6 and 4 spikes in each model
that lie outside of the confidence band for white noise. Thus, all three models
are not stationary.
dfit |>
augment() |>
ACF(.innov) |>
autoplot()
2
0.2
0.1
etslogtr
0.0
−0.1
−0.2
−0.3
0.2
0.1
etsmtplns
0.0
acf
−0.1
−0.2
−0.3
0.2
etsmtplnsdamp
0.1
0.0
−0.1
−0.2
−0.3
6 12 18
lag [1M]
c. Calculate the p-value from a Ljung-Box test on the residuals with lag ℎ = 8. What can
you say about the stationarity of the residuals from the p-value? What does this mean
about the model?
Answer: From the result, we can find that the p-value of the three models
are all less than 0.05, which means that we should reject the null hypothesis
and the remainder components are not white noise.
dfit |>
augment() |>
features(.innov,ljung_box,lag = 8)
# A tibble: 3 x 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 etslogtr 32.4 0.0000780
2 etsmtplns 18.3 0.0188
3 etsmtplnsdamp 19.2 0.0141
d. Perform time series cross-validation for the three methods, using .init = 50 and .step
= 10, and with the forecast horizon ℎ = 4. Which method has the best RMSSE? How
many data points is the error averaged over in total?
Answer: The model, Holt-Winters with a log transformation, with additive
noise and seasonality, has the best RMSSE, which is 0.89. From the setting
3
of the initial window size, 50, incremental step, 10, and the number of the
data points in diabetes, 204, we calculate that there are (204 − 50)/10 + 1 �
16 splits in total.
# A tibble: 3 x 3
.model RMSSE MASE
<chr> <dbl> <dbl>
1 etslogtr 0.892 0.791
2 etsmtplns 0.937 0.823
3 etsmtplnsdamp 1.05 1.00
𝑋𝑡 = 𝛽 0 + 𝛽 1 𝑡 + 𝑊 𝑡 .
Define a time series (𝑌𝑡 ) by taking a moving average of (𝑋𝑡 ) with a symmetric window of size
7. Define another times series (𝑍𝑡 ) by taking a difference of (𝑋𝑡 ).
a. What is the mean function for (𝑌𝑡 )? What is the ACVF for (𝑌𝑡 )?
b. What is the mean function for (𝑍𝑡 )? What is its ACVF?
c. What is the CCF of (𝑌𝑡 ) and (𝑍𝑡 )?
d. Are (𝑌𝑡 ) and (𝑍𝑡 ) jointly stationary?
4
Answer:
5
3. Sample vs population ACF
𝑋𝑡 = sin(2𝜋𝑡/5) + 𝑊𝑡 .
6
b. Simulate a time series 𝑋1 , 𝑋2 , … , 𝑋200 from this model and plot its sample ACF.
set.seed(5209)
plot1<- tibble(
t = seq(1,200,1),
wn = rnorm(200),
xt = sin(2*pi*t/5)+wn) |>
as_tsibble(index = t)
plot1 |>
ACF(xt) |>
autoplot()
7
0.3
0.2
0.1
acf
0.0
−0.1
−0.2
−0.3
5 10 15 20
lag [1]
a. Why does the sample ACF not look like the population ACF function?
Answer:
a. Why does the asymptotic normality theorem for the ACF not apply?
Answer: Time series is not stationary since 𝐸(𝑋𝑡 ) is dependent with t, so asymp-
totic normality theorems like WLLN and CLT don’t hold.