Lab 4
Lab 4
Packages
library(forecast)
library(lmtest)
library(tseries)
Summary
In this lab: 1.we will do a recap of the last three computer labs and you will learn: 2. How to use an R
function, written for this course, to convert an Arima forecast of logged prices to a forecast of regular prices.
3. How to plot this forecast. 4. Test for ARCH effects
1. Recap
Let’s start with what we have learned so far. Step I: I have to read my dataset, which I downloaded from
yahoo finance. I downloaded Google’s stock data from January 2015 until January 2022.
For simplicity reasons, for the Computer lab, I will keep only the column date and Adj.Close and I make
sure that I have only one observation per month.
Step II: I use a plot in order to describe the time series. I use “ts” function and I create the log price and
the log return
ts.plot(GOOG$logreturn_Adj.close)
abline(h = 0, lty = 2, col = "dark red")
1
GOOG$logreturn_Adj.close
0.10
0.00
−0.10
−0.20
Time
Do I see any clear pattern in my diagram? Can I say that my time series is stationary? Is there a way to
test if the trend of the time series is stationary or not? Step III: I run a Regular Dickey Fuller test.
adf.test(GOOG$logreturn_Adj.close[-(1:2)], k = 0)
##
## Augmented Dickey-Fuller Test
##
## data: GOOG$logreturn_Adj.close[-(1:2)]
## Dickey-Fuller = -10.818, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary
Note about the Dickey-Fuller test: Transform your data until it your time series is stationary.
Step IV: I create a variable called time that I will need later and I split the time series into a training and a
testing set:
2
Step V: I estimate a random walk using the training set. Let us create a forecast assuming that the log price
is a random walk with a drift. This means that we will fit a ARIMA(0,1,0) to the log price. Alternatively,
we could fit an ARIMA(0,0,0) to the log return, but this would make forecasting a little harder.
## Series: training$log_Adj.close
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## 0.0119
## s.e. 0.0075
##
## sigma^2 = 0.004538: log likelihood = 102.8
## AIC=-201.6 AICc=-201.45 BIC=-196.84
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 4.448157e-05 0.06652704 0.05162596 -0.005778455 1.224137 0.9506535
## ACF1
## Training set -0.1613332
## 2.5 % 97.5 %
## drift -0.002782661 0.02655799
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## drift 0.011888 0.007485 1.5882 0.1122
checkresiduals(ARIMA010)
3
Residuals from ARIMA(0,1,0) with drift
0.1
0.0
−0.1
−0.2
0 20 40 60 80
0.2
20
0.1
15
ACF
df$y
0.0
10
−0.1
5
−0.2
0
5 10 15 −0.2 −0.1 0.0 0.1 0.2
Lag residuals
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,0) with drift
## Q* = 9.2848, df = 10, p-value = 0.5053
##
## Model df: 0. Total lags used: 10
##
## Call:
## lm(formula = Adj.Close ~ time, data = training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.107 -11.687 1.010 6.737 39.004
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.23703 3.36707 6.01 5.4e-08 ***
## time 1.27185 0.07134 17.83 < 2e-16 ***
4
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 15.01 on 79 degrees of freedom
## Multiple R-squared: 0.8009, Adjusted R-squared: 0.7984
## F-statistic: 317.8 on 1 and 79 DF, p-value: < 2.2e-16
### how to plot the regression model with the training set and
### including the forecast and the testing set
plot((1:nrow(GOOG)), GOOG$Adj.Close, type = "l", xlim = c(0, 85))
lines((1:(nrow(GOOG) - 4)), regression$fitted.values, col = "dark blue", lwd = 2, lty = 2)
lines(82:85, testing$Adj.Close, type = "o", col = "dark red", lwd = 2)
lines(82:85, regression_forecast, type = "o", col = "dark blue", lwd = 2)
100 120 140
GOOG$Adj.Close
80
60
40
0 20 40 60 80
(1:nrow(GOOG))
5
## testing estimated residuals
## 1 94.66 124.5290 -29.86901
## 2 101.45 125.8009 -24.35087
## 3 88.73 127.0727 -38.34272
## 4 99.87 128.3446 -28.47457
Calculate RMSE for the regression model using the above table.
library(Metrics)
##
## Attaching package: ’Metrics’
RMSE_regression<-rmsle(regression_table$testing,regression_table$estimated)
RMSE_regression
## [1] 0.277366
##
## Forecast method: Holt’s method
##
## Model Information:
## Holt’s method
##
## Call:
## holt(y = training$Adj.Close, h = 4, exponential = FALSE)
##
## Smoothing parameters:
## alpha = 0.5209
## beta = 0.2279
##
## Initial states:
## l = 36.3075
## b = -0.1851
##
## sigma: 5.7216
##
## AIC AICc BIC
## 644.4166 645.2166 656.3888
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
6
## Training set -0.2593424 5.57854 3.94528 -0.1856814 5.374785 0.9461912
## ACF1
## Training set 0.04402264
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 82 95.13739 87.80486 102.46992 83.92325 106.3515
## 83 90.16544 81.00525 99.32562 76.15614 104.1747
## 84 85.19349 73.56632 96.82065 67.41128 102.9757
## 85 80.22153 65.62033 94.82274 57.89092 102.5521
plot(holt_model)
lines((nrow(GOOG) - 3):nrow(GOOG), testing$Adj.Close, col = "dark red", lwd = 2, type = "o")
0 20 40 60 80
holt_table
7
## 1 94.66 95.13739 -0.4773835
## 2 101.45 90.16544 11.2845606
## 3 88.73 85.19349 3.5365177
## 4 99.87 80.22153 19.6484688
RMSE_HOlt<-rmse(holt_table$testing,holt_table$`Holt's forecast`)
RMSE_HOlt
## [1] 11.46885
ACF
0.3
0.1
ACF
−0.1
−0.3
1 2 3 4 5 6 7 8 9 10 11 12
Lag
8
PACF
0.3
0.1
Partial ACF
−0.1
−0.3
1 2 3 4 5 6 7 8 9 10 11 12
Lag
### model p = 4, d = 1, q = 4
ARIMA414 <- Arima(training$log_Adj.close, order = c(4, 1, 4), include.drift =T)
summary(ARIMA414)
## Series: training$log_Adj.close
## ARIMA(4,1,4) with drift
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
## 0.5028 -0.0516 0.8321 -0.6155 -0.6951 0.0119 -0.7309 0.9673
## s.e. 0.1569 0.1050 0.1052 0.1563 0.1692 0.1603 0.1176 0.1364
## drift
## 0.0097
## s.e. 0.0103
##
## sigma^2 = 0.003702: log likelihood = 112.17
## AIC=-204.34 AICc=-201.15 BIC=-180.52
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
9
## Training set 0.0005269856 0.05696635 0.04394368 0.01383662 1.046896 0.80919
## ACF1
## Training set -0.0738096
coeftest(ARIMA414)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.5027654 0.1568676 3.2050 0.00135 **
## ar2 -0.0516019 0.1050358 -0.4913 0.62323
## ar3 0.8321288 0.1052306 7.9077 2.622e-15 ***
## ar4 -0.6154997 0.1563331 -3.9371 8.247e-05 ***
## ma1 -0.6950654 0.1691727 -4.1086 3.980e-05 ***
## ma2 0.0118910 0.1603453 0.0742 0.94088
## ma3 -0.7308628 0.1176463 -6.2124 5.219e-10 ***
## ma4 0.9672586 0.1364291 7.0898 1.343e-12 ***
## drift 0.0097372 0.0103336 0.9423 0.34605
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
one AR and MA term is not significant in the model. Since MA2 is not significant and has highest P-value
which is 0.94088. So, we will drop this term in the model.
### model p = 4, d = 1, q = 3
ARIMA413 <- Arima(training$log_Adj.close, order = c(4, 1, 3), include.drift = T)
summary(ARIMA413)
## Series: training$log_Adj.close
## ARIMA(4,1,3) with drift
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 drift
## -0.5202 -0.1688 0.7615 0.2864 0.4014 0.1952 -0.7185 0.0106
## s.e. 0.2075 0.2143 0.2035 0.1147 0.1923 0.2116 0.1891 0.0094
##
## sigma^2 = 0.004093: log likelihood = 108.65
## AIC=-199.29 AICc=-196.72 BIC=-177.86
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0001364897 0.06032049 0.04816633 0.002002214 1.141745 0.8869469
## ACF1
## Training set 0.003984436
coeftest(ARIMA413)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
10
## ar1 -0.520187 0.207484 -2.5071 0.0121720 *
## ar2 -0.168842 0.214318 -0.7878 0.4308073
## ar3 0.761455 0.203473 3.7423 0.0001823 ***
## ar4 0.286442 0.114684 2.4977 0.0125016 *
## ma1 0.401415 0.192256 2.0879 0.0368057 *
## ma2 0.195182 0.211648 0.9222 0.3564249
## ma3 -0.718515 0.189109 -3.7995 0.0001450 ***
## drift 0.010569 0.009396 1.1248 0.2606547
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
One AR and MA terms are not significant. Since AR2 is not significant and has highest P-value which is
0.4308073. So, we will drop this term in the next step from the model.
### model p = 3, d = 1, q = 3
ARIMA313 <- Arima(training$log_Adj.close, order = c(3, 1, 3), include.drift = TRUE)
summary(ARIMA313)
## Series: training$log_Adj.close
## ARIMA(3,1,3) with drift
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 drift
## 0.6482 0.7182 -0.5771 -0.9484 -0.5408 0.8180 0.0096
## s.e. 0.2572 0.3175 0.1677 0.2735 0.4247 0.2451 0.0102
##
## sigma^2 = 0.003974: log likelihood = 109.74
## AIC=-203.48 AICc=-201.45 BIC=-184.43
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.000445232 0.05984848 0.04597165 0.01122462 1.093784 0.8465336
## ACF1
## Training set -0.009143674
coeftest(ARIMA313)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.6482315 0.2571628 2.5207 0.0117120 *
## ar2 0.7182306 0.3174787 2.2623 0.0236792 *
## ar3 -0.5771159 0.1677336 -3.4407 0.0005803 ***
## ma1 -0.9484376 0.2734755 -3.4681 0.0005242 ***
## ma2 -0.5408088 0.4247258 -1.2733 0.2029071
## ma3 0.8180417 0.2451190 3.3373 0.0008459 ***
## drift 0.0096475 0.0101886 0.9469 0.3436925
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Since MA2 term is not significant and has highest P-value which is 0.2029071. So, we will drop this term in
the next step from the previous model.
11
### model p = 3, d = 1, q = 2
ARIMA312 <- Arima(training$log_Adj.close, order = c(3, 1, 2), include.drift = TRUE)
summary(ARIMA312)
## Series: training$log_Adj.close
## ARIMA(3,1,2) with drift
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 drift
## -0.2865 0.7583 0.1679 0.1287 -0.6889 0.0107
## s.e. 0.3477 0.2348 0.1395 0.3253 0.2554 0.0091
##
## sigma^2 = 0.004644: log likelihood = 104.45
## AIC=-194.9 AICc=-193.34 BIC=-178.22
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 0.0001101548 0.06513442 0.05039626 -0.0003218164 1.195042
## MASE ACF1
## Training set 0.9280095 -0.02219786
coeftest(ARIMA312)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.2864539 0.3477100 -0.8238 0.410036
## ar2 0.7582733 0.2347994 3.2295 0.001240 **
## ar3 0.1679470 0.1394598 1.2043 0.228486
## ma1 0.1287263 0.3252842 0.3957 0.692301
## ma2 -0.6888583 0.2553647 -2.6975 0.006985 **
## drift 0.0106558 0.0091212 1.1682 0.242708
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Again some AR and MA terms are not significant. Since MA1 term is not significant and has highest P-value
which is 0.692301. So, we will drop this term in the next step from the previous model.
### model p = 3, d = 1, q = 1
ARIMA311<- Arima(training$log_Adj.close, order = c(3, 1, 1), include.drift = TRUE)
summary(ARIMA311)
## Series: training$log_Adj.close
## ARIMA(3,1,1) with drift
##
## Coefficients:
## ar1 ar2 ar3 ma1 drift
## 0.5143 0.1317 0.1315 -0.6995 0.0099
## s.e. 0.2109 0.1301 0.1184 0.1806 0.0099
##
## sigma^2 = 0.004516: log likelihood = 104.99
12
## AIC=-197.98 AICc=-196.83 BIC=-183.69
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0003057396 0.06466655 0.05085581 0.006067046 1.209861 0.9364717
## ACF1
## Training set -0.009775339
coeftest(ARIMA311)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.5143211 0.2109247 2.4384 0.014752 *
## ar2 0.1317039 0.1301457 1.0120 0.311551
## ar3 0.1315252 0.1183873 1.1110 0.266580
## ma1 -0.6994986 0.1805579 -3.8741 0.000107 ***
## drift 0.0099416 0.0099499 0.9992 0.317714
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Two AR terms are not significant. Since AR2 term is not significant and has highest P-value which is
0.311551. So, we will drop this term in the next step from the previous model.
### model p = 2, d = 1, q = 1
ARIMA211 <- Arima(training$log_Adj.close, order = c(2, 1, 1), include.drift = TRUE)
summary(ARIMA211)
## Series: training$log_Adj.close
## ARIMA(2,1,1) with drift
##
## Coefficients:
## ar1 ar2 ma1 drift
## -1.0395 -0.1206 0.8770 0.0123
## s.e. 0.2861 0.1372 0.2547 0.0064
##
## sigma^2 = 0.004572: log likelihood = 104.03
## AIC=-198.05 AICc=-197.24 BIC=-186.14
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.000123686 0.06549716 0.05068155 -0.01062332 1.198091 0.9332629
## ACF1
## Training set -0.00432817
coeftest(ARIMA211)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
13
## ar1 -1.0394680 0.2861321 -3.6328 0.0002803 ***
## ar2 -0.1205858 0.1371987 -0.8789 0.3794480
## ma1 0.8770004 0.2547166 3.4430 0.0005752 ***
## drift 0.0122529 0.0064187 1.9090 0.0562681 .
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Since AR2 term is not significant and has highest P-value which is 0.3794480 . So, we will drop this term in
the model in next step.
### model p = 1, d = 1, q = 1
ARIMA111 <- Arima(training$log_Adj.close, order = c(1, 1, 1), include.drift = TRUE)
summary(ARIMA111)
## Series: training$log_Adj.close
## ARIMA(1,1,1) with drift
##
## Coefficients:
## ar1 ma1 drift
## -0.8314 0.7474 0.0120
## s.e. 0.4017 0.4775 0.0071
##
## sigma^2 = 0.004559: log likelihood = 103.62
## AIC=-199.23 AICc=-198.7 BIC=-189.7
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -9.764215e-05 0.06583236 0.05094031 -0.009183767 1.206142
## MASE ACF1
## Training set 0.9380277 -0.07078971
coeftest(ARIMA111)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.831374 0.401676 -2.0698 0.03847 *
## ma1 0.747417 0.477457 1.5654 0.11749
## drift 0.012026 0.007071 1.7008 0.08898 .
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
checkresiduals(ARIMA111)
14
Residuals from ARIMA(1,1,1) with drift
0.1
0.0
−0.1
−0.2
0 20 40 60 80
0.2
20
0.1
15
ACF
df$y
0.0
10
−0.1
5
−0.2
0
5 10 15 −0.2 −0.1 0.0 0.1 0.2
Lag residuals
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1) with drift
## Q* = 9.2513, df = 8, p-value = 0.3215
##
## Model df: 2. Total lags used: 10
Since MA1 term is not significant and has highest P-value which is 0.11749. So, we will drop this term in
the model in next step.
### model p = 1, d = 1, q = 0
ARIMA110 <- Arima(training$log_Adj.close, order = c(1, 1, 0), include.drift = TRUE)
summary(ARIMA110)
## Series: training$log_Adj.close
## ARIMA(1,1,0) with drift
##
## Coefficients:
## ar1 drift
## -0.1705 0.0123
## s.e. 0.1136 0.0063
##
## sigma^2 = 0.004469: log likelihood = 103.91
## AIC=-201.82 AICc=-201.51 BIC=-194.68
15
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -9.986018e-05 0.06559776 0.05051347 -0.01029073 1.193776 0.9301678
## ACF1
## Training set -0.002278289
coeftest(ARIMA110)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.1705052 0.1135769 -1.5012 0.1333
## drift 0.0122775 0.0063211 1.9423 0.0521 .
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
checkresiduals(ARIMA110)
0.1
0.0
−0.1
−0.2
0 20 40 60 80
0.2
20
0.1
15
ACF
df$y
0.0
10
−0.1
5
−0.2
0
5 10 15 −0.2 −0.1 0.0 0.1 0.2
Lag residuals
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,0) with drift
16
## Q* = 7.3749, df = 9, p-value = 0.5981
##
## Model df: 1. Total lags used: 10
The model seems to be a good mode as we have small AIC value. But it is always recommended to check
ARIMA (0,1,1) also
## Series: training$log_Adj.close
## ARIMA(0,1,1) with drift
##
## Coefficients:
## ma1 drift
## -0.1626 0.0123
## s.e. 0.1074 0.0062
##
## sigma^2 = 0.004473: log likelihood = 103.87
## AIC=-201.75 AICc=-201.43 BIC=-194.6
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0001226027 0.06563078 0.05059285 -0.01106166 1.195624 0.9316295
## ACF1
## Training set -0.00824293
coeftest(ARIMA011 )
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.1626465 0.1073594 -1.5150 0.12978
## drift 0.0123500 0.0062058 1.9901 0.04658 *
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
checkresiduals(ARIMA011 )
17
Residuals from ARIMA(0,1,1) with drift
0.1
0.0
−0.1
−0.2
0 20 40 60 80
0.2
20
0.1
15
ACF
df$y
0.0
10
−0.1
5
−0.2
0
5 10 15 −0.2 −0.1 0.0 0.1 0.2
Lag residuals
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1) with drift
## Q* = 7.4427, df = 9, p-value = 0.5911
##
## Model df: 1. Total lags used: 10
This model is also looks a good model. This model has also small AIC.
Step XI: AIC and BIC table for the selected good models
18
BIC_table <- data.frame(cbind(ARIMA010$bic, ARIMA011$bic, ARIMA110$bic, ARIMA111$bic),
row.names = "BIC")
names(BIC_table) <- c("ARIMA(0, 1, 0)", "ARIMA(0, 1, 1)", "ARIMA(1, 1, 0)",
"ARIMA(1, 1, 1)")
BIC_table
The problem with this forecast is that it is logged. We want to compare the true closing prices of the testing
set with the forecast. First, we want to create a plot where we compare the forecasted values with confidence
intervals to the true test values. To make this easier, we will give you a short cut: a specially made function
that will take a log-price forecast from the forecast package and convert it to a regular price forecast.
First, we will use the function “forecast” that gives us the forecast for four periods forward
### model p = 0, d = 1, q = 0
forecast010 <- forecast(ARIMA010, h = 4)
Then I will use the “log2price” function to turn the logged forecast into a regular forecast. Then we can
create the plot.
19
Forecasts from ARIMA(0,1,0) with drift
100 120 140
80
60
40
0 20 40 60 80
I will summarize the forecasts for the last 4 points with the testing values and their difference in a table
RMSE_ARIMA010<-rmse(ARIMA010_table$testing.Adj.Close,ARIMA010_table$ARIMA010_forecast)
RMSE_ARIMA010
## [1] 5.82799
20
4. Test for ARCH effects
After selecting the best model based on RMSE, AIC and BIC, you can test the model for ARCH effects
among the residuals. To do that you will need to install the package “FinTS”
##install.packages("FinTS")
After installing the package you do not need to install the same package again on your computer and you
do not need to run any library for the command below that will give you the ARCH test
FinTS::ArchTest(ARIMA010$residuals)
##
## ARCH LM-test; Null hypothesis: no ARCH effects
##
## data: ARIMA010$residuals
## Chi-squared = 17.831, df = 12, p-value = 0.1209
Note that there are two “:” between FinTS and ArchTest. MAke sure that you write the command as above.
In case you try to run it with only one “:” you will get an Error
21