0% found this document useful (0 votes)
42 views

Lab 4

1. This document summarizes a computer lab on time series forecasting using R. 2. The lab recaps previous labs on reading in and plotting time series data, performing stationarity tests, and fitting ARIMA and regression models. 3. It then demonstrates forecasting using these models, calculating errors for the regression and Holt's smoothing forecasts against actual later values.

Uploaded by

glavchevanasik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Lab 4

1. This document summarizes a computer lab on time series forecasting using R. 2. The lab recaps previous labs on reading in and plotting time series data, performing stationarity tests, and fitting ARIMA and regression models. 3. It then demonstrates forecasting using these models, calculating errors for the regression and Holt's smoothing forecasts against actual later values.

Uploaded by

glavchevanasik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Computer Lab 4

Mahmood Ul Hassan and Andriy Andreev

Packages

library(forecast)
library(lmtest)
library(tseries)

Summary

In this lab: 1.we will do a recap of the last three computer labs and you will learn: 2. How to use an R
function, written for this course, to convert an Arima forecast of logged prices to a forecast of regular prices.
3. How to plot this forecast. 4. Test for ARCH effects

1. Recap

Let’s start with what we have learned so far. Step I: I have to read my dataset, which I downloaded from
yahoo finance. I downloaded Google’s stock data from January 2015 until January 2022.

GOOG <- read.csv("GOOG_7.csv")

For simplicity reasons, for the Computer lab, I will keep only the column date and Adj.Close and I make
sure that I have only one observation per month.

GOOG <- GOOG[, -c(2,3,4,5,7)]

Step II: I use a plot in order to describe the time series. I use “ts” function and I create the log price and
the log return

GOOG$Adj.Close <- ts(GOOG$Adj.Close, start = c(2016, 1), frequency = 12)


### start = c(year, month)

GOOG$log_Adj.close <- log(GOOG$Adj.Close)


GOOG$logreturn_Adj.close <- c(NA, diff(GOOG$log_Adj.close))

GOOG$logreturn_Adj.close <- ts(GOOG$logreturn_Adj.close, start = c(2016, 1), frequency = 12)

ts.plot(GOOG$logreturn_Adj.close)
abline(h = 0, lty = 2, col = "dark red")

1
GOOG$logreturn_Adj.close

0.10
0.00
−0.10
−0.20

2016 2017 2018 2019 2020 2021 2022 2023

Time

Do I see any clear pattern in my diagram? Can I say that my time series is stationary? Is there a way to
test if the trend of the time series is stationary or not? Step III: I run a Regular Dickey Fuller test.

adf.test(GOOG$logreturn_Adj.close[-(1:2)], k = 0)

## Warning in adf.test(GOOG$logreturn_Adj.close[-(1:2)], k = 0): p-value smaller


## than printed p-value

##
## Augmented Dickey-Fuller Test
##
## data: GOOG$logreturn_Adj.close[-(1:2)]
## Dickey-Fuller = -10.818, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary

Note about the Dickey-Fuller test: Transform your data until it your time series is stationary.
Step IV: I create a variable called time that I will need later and I split the time series into a training and a
testing set:

GOOG$time <- 1:nrow(GOOG)

training <- GOOG[1:(nrow(GOOG) - 4), ]


testing <- GOOG[(nrow(GOOG) - 3): nrow(GOOG), ]

2
Step V: I estimate a random walk using the training set. Let us create a forecast assuming that the log price
is a random walk with a drift. This means that we will fit a ARIMA(0,1,0) to the log price. Alternatively,
we could fit an ARIMA(0,0,0) to the log return, but this would make forecasting a little harder.

ARIMA010 <- Arima(training$log_Adj.close, order = c(0, 1, 0), include.drift =T)


summary(ARIMA010)

## Series: training$log_Adj.close
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## 0.0119
## s.e. 0.0075
##
## sigma^2 = 0.004538: log likelihood = 102.8
## AIC=-201.6 AICc=-201.45 BIC=-196.84
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 4.448157e-05 0.06652704 0.05162596 -0.005778455 1.224137 0.9506535
## ACF1
## Training set -0.1613332

### print the confidence interval


confint(ARIMA010)

## 2.5 % 97.5 %
## drift -0.002782661 0.02655799

### print the p-values


coeftest(ARIMA010)

##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## drift 0.011888 0.007485 1.5882 0.1122

checkresiduals(ARIMA010)

3
Residuals from ARIMA(0,1,0) with drift

0.1

0.0

−0.1

−0.2
0 20 40 60 80

0.2
20

0.1
15
ACF

df$y
0.0
10

−0.1
5

−0.2
0
5 10 15 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,0) with drift
## Q* = 9.2848, df = 10, p-value = 0.5053
##
## Model df: 0. Total lags used: 10

Step VII: We estimate a regression model using time as an explanatory variable

regression <- lm(formula = Adj.Close ~ time, data = training)


summary(regression)

##
## Call:
## lm(formula = Adj.Close ~ time, data = training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.107 -11.687 1.010 6.737 39.004
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.23703 3.36707 6.01 5.4e-08 ***
## time 1.27185 0.07134 17.83 < 2e-16 ***

4
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 15.01 on 79 degrees of freedom
## Multiple R-squared: 0.8009, Adjusted R-squared: 0.7984
## F-statistic: 317.8 on 1 and 79 DF, p-value: < 2.2e-16

### forecast four periods forward


### you need to create a data frame with the time for which you want to predict
forward_data <- data.frame(time = 82:85)
regression_forecast <- predict.lm(regression, newdata = forward_data)

### how to plot the regression model with the training set and
### including the forecast and the testing set
plot((1:nrow(GOOG)), GOOG$Adj.Close, type = "l", xlim = c(0, 85))
lines((1:(nrow(GOOG) - 4)), regression$fitted.values, col = "dark blue", lwd = 2, lty = 2)
lines(82:85, testing$Adj.Close, type = "o", col = "dark red", lwd = 2)
lines(82:85, regression_forecast, type = "o", col = "dark blue", lwd = 2)
100 120 140
GOOG$Adj.Close

80
60
40

0 20 40 60 80

(1:nrow(GOOG))

regression_forecast_error <- testing$Adj.Close - regression_forecast

regression_table <- data.frame(cbind(testing$Adj.Close, regression_forecast, regression_forecast_error))


names(regression_table) <- c("testing", "estimated", "residuals")
regression_table

5
## testing estimated residuals
## 1 94.66 124.5290 -29.86901
## 2 101.45 125.8009 -24.35087
## 3 88.73 127.0727 -38.34272
## 4 99.87 128.3446 -28.47457

Calculate RMSE for the regression model using the above table.

library(Metrics)

##
## Attaching package: ’Metrics’

## The following object is masked from ’package:forecast’:


##
## accuracy

RMSE_regression<-rmsle(regression_table$testing,regression_table$estimated)
RMSE_regression

## [1] 0.277366

Step VIII: Fit a Holt’s smoothing model

holt_model <-holt(training$Adj.Close, h = 4, exponential = FALSE)


summary(holt_model)

##
## Forecast method: Holt’s method
##
## Model Information:
## Holt’s method
##
## Call:
## holt(y = training$Adj.Close, h = 4, exponential = FALSE)
##
## Smoothing parameters:
## alpha = 0.5209
## beta = 0.2279
##
## Initial states:
## l = 36.3075
## b = -0.1851
##
## sigma: 5.7216
##
## AIC AICc BIC
## 644.4166 645.2166 656.3888
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE

6
## Training set -0.2593424 5.57854 3.94528 -0.1856814 5.374785 0.9461912
## ACF1
## Training set 0.04402264
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 82 95.13739 87.80486 102.46992 83.92325 106.3515
## 83 90.16544 81.00525 99.32562 76.15614 104.1747
## 84 85.19349 73.56632 96.82065 67.41128 102.9757
## 85 80.22153 65.62033 94.82274 57.89092 102.5521

plot(holt_model)
lines((nrow(GOOG) - 3):nrow(GOOG), testing$Adj.Close, col = "dark red", lwd = 2, type = "o")

Forecasts from Holt's method


100 120 140
80
60
40

0 20 40 60 80

### forecast four periods forward


holt_forecast <- holt_model$mean
holt_forecast_error <- testing$Adj.Close - holt_forecast
### summarize the calculations in a table
holt_table <-data.frame(cbind(testing$Adj.Close, holt_forecast, holt_forecast_error))
names(holt_table) <- c("testing", "Holt's forecast",
"Holt's forecast error")

holt_table

## testing Holt’s forecast Holt’s forecast error

7
## 1 94.66 95.13739 -0.4773835
## 2 101.45 90.16544 11.2845606
## 3 88.73 85.19349 3.5365177
## 4 99.87 80.22153 19.6484688

Calculate RMSE for Holt’s model using the above table.

RMSE_HOlt<-rmse(holt_table$testing,holt_table$`Holt's forecast`)
RMSE_HOlt

## [1] 11.46885

Step IX: Acf and Pacf plots

Acf(training$logreturn_Adj.close, main = "ACF", lag.max=12)

ACF
0.3
0.1
ACF

−0.1
−0.3

1 2 3 4 5 6 7 8 9 10 11 12

Lag

Pacf(training$logreturn_Adj.close, main = "PACF",lag.max=12)

8
PACF
0.3
0.1
Partial ACF

−0.1
−0.3

1 2 3 4 5 6 7 8 9 10 11 12

Lag

Step X: Fit ARIMA models


Since there is no reason to believe that the current stock value depends on the stock value from 11 months
ago, we will start with an ARIMA(4,1,4) model and try to find a good model based on P-values and AIC.
During the model selection process, we only need to check the residuals for the good candidate model.

### model p = 4, d = 1, q = 4
ARIMA414 <- Arima(training$log_Adj.close, order = c(4, 1, 4), include.drift =T)
summary(ARIMA414)

## Series: training$log_Adj.close
## ARIMA(4,1,4) with drift
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
## 0.5028 -0.0516 0.8321 -0.6155 -0.6951 0.0119 -0.7309 0.9673
## s.e. 0.1569 0.1050 0.1052 0.1563 0.1692 0.1603 0.1176 0.1364
## drift
## 0.0097
## s.e. 0.0103
##
## sigma^2 = 0.003702: log likelihood = 112.17
## AIC=-204.34 AICc=-201.15 BIC=-180.52
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE

9
## Training set 0.0005269856 0.05696635 0.04394368 0.01383662 1.046896 0.80919
## ACF1
## Training set -0.0738096

coeftest(ARIMA414)

##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.5027654 0.1568676 3.2050 0.00135 **
## ar2 -0.0516019 0.1050358 -0.4913 0.62323
## ar3 0.8321288 0.1052306 7.9077 2.622e-15 ***
## ar4 -0.6154997 0.1563331 -3.9371 8.247e-05 ***
## ma1 -0.6950654 0.1691727 -4.1086 3.980e-05 ***
## ma2 0.0118910 0.1603453 0.0742 0.94088
## ma3 -0.7308628 0.1176463 -6.2124 5.219e-10 ***
## ma4 0.9672586 0.1364291 7.0898 1.343e-12 ***
## drift 0.0097372 0.0103336 0.9423 0.34605
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

one AR and MA term is not significant in the model. Since MA2 is not significant and has highest P-value
which is 0.94088. So, we will drop this term in the model.

### model p = 4, d = 1, q = 3
ARIMA413 <- Arima(training$log_Adj.close, order = c(4, 1, 3), include.drift = T)
summary(ARIMA413)

## Series: training$log_Adj.close
## ARIMA(4,1,3) with drift
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 drift
## -0.5202 -0.1688 0.7615 0.2864 0.4014 0.1952 -0.7185 0.0106
## s.e. 0.2075 0.2143 0.2035 0.1147 0.1923 0.2116 0.1891 0.0094
##
## sigma^2 = 0.004093: log likelihood = 108.65
## AIC=-199.29 AICc=-196.72 BIC=-177.86
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0001364897 0.06032049 0.04816633 0.002002214 1.141745 0.8869469
## ACF1
## Training set 0.003984436

coeftest(ARIMA413)

##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)

10
## ar1 -0.520187 0.207484 -2.5071 0.0121720 *
## ar2 -0.168842 0.214318 -0.7878 0.4308073
## ar3 0.761455 0.203473 3.7423 0.0001823 ***
## ar4 0.286442 0.114684 2.4977 0.0125016 *
## ma1 0.401415 0.192256 2.0879 0.0368057 *
## ma2 0.195182 0.211648 0.9222 0.3564249
## ma3 -0.718515 0.189109 -3.7995 0.0001450 ***
## drift 0.010569 0.009396 1.1248 0.2606547
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

One AR and MA terms are not significant. Since AR2 is not significant and has highest P-value which is
0.4308073. So, we will drop this term in the next step from the model.

### model p = 3, d = 1, q = 3
ARIMA313 <- Arima(training$log_Adj.close, order = c(3, 1, 3), include.drift = TRUE)
summary(ARIMA313)

## Series: training$log_Adj.close
## ARIMA(3,1,3) with drift
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 drift
## 0.6482 0.7182 -0.5771 -0.9484 -0.5408 0.8180 0.0096
## s.e. 0.2572 0.3175 0.1677 0.2735 0.4247 0.2451 0.0102
##
## sigma^2 = 0.003974: log likelihood = 109.74
## AIC=-203.48 AICc=-201.45 BIC=-184.43
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.000445232 0.05984848 0.04597165 0.01122462 1.093784 0.8465336
## ACF1
## Training set -0.009143674

coeftest(ARIMA313)

##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.6482315 0.2571628 2.5207 0.0117120 *
## ar2 0.7182306 0.3174787 2.2623 0.0236792 *
## ar3 -0.5771159 0.1677336 -3.4407 0.0005803 ***
## ma1 -0.9484376 0.2734755 -3.4681 0.0005242 ***
## ma2 -0.5408088 0.4247258 -1.2733 0.2029071
## ma3 0.8180417 0.2451190 3.3373 0.0008459 ***
## drift 0.0096475 0.0101886 0.9469 0.3436925
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Since MA2 term is not significant and has highest P-value which is 0.2029071. So, we will drop this term in
the next step from the previous model.

11
### model p = 3, d = 1, q = 2
ARIMA312 <- Arima(training$log_Adj.close, order = c(3, 1, 2), include.drift = TRUE)
summary(ARIMA312)

## Series: training$log_Adj.close
## ARIMA(3,1,2) with drift
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 drift
## -0.2865 0.7583 0.1679 0.1287 -0.6889 0.0107
## s.e. 0.3477 0.2348 0.1395 0.3253 0.2554 0.0091
##
## sigma^2 = 0.004644: log likelihood = 104.45
## AIC=-194.9 AICc=-193.34 BIC=-178.22
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 0.0001101548 0.06513442 0.05039626 -0.0003218164 1.195042
## MASE ACF1
## Training set 0.9280095 -0.02219786

coeftest(ARIMA312)

##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.2864539 0.3477100 -0.8238 0.410036
## ar2 0.7582733 0.2347994 3.2295 0.001240 **
## ar3 0.1679470 0.1394598 1.2043 0.228486
## ma1 0.1287263 0.3252842 0.3957 0.692301
## ma2 -0.6888583 0.2553647 -2.6975 0.006985 **
## drift 0.0106558 0.0091212 1.1682 0.242708
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Again some AR and MA terms are not significant. Since MA1 term is not significant and has highest P-value
which is 0.692301. So, we will drop this term in the next step from the previous model.

### model p = 3, d = 1, q = 1
ARIMA311<- Arima(training$log_Adj.close, order = c(3, 1, 1), include.drift = TRUE)
summary(ARIMA311)

## Series: training$log_Adj.close
## ARIMA(3,1,1) with drift
##
## Coefficients:
## ar1 ar2 ar3 ma1 drift
## 0.5143 0.1317 0.1315 -0.6995 0.0099
## s.e. 0.2109 0.1301 0.1184 0.1806 0.0099
##
## sigma^2 = 0.004516: log likelihood = 104.99

12
## AIC=-197.98 AICc=-196.83 BIC=-183.69
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0003057396 0.06466655 0.05085581 0.006067046 1.209861 0.9364717
## ACF1
## Training set -0.009775339

coeftest(ARIMA311)

##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.5143211 0.2109247 2.4384 0.014752 *
## ar2 0.1317039 0.1301457 1.0120 0.311551
## ar3 0.1315252 0.1183873 1.1110 0.266580
## ma1 -0.6994986 0.1805579 -3.8741 0.000107 ***
## drift 0.0099416 0.0099499 0.9992 0.317714
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Two AR terms are not significant. Since AR2 term is not significant and has highest P-value which is
0.311551. So, we will drop this term in the next step from the previous model.

### model p = 2, d = 1, q = 1
ARIMA211 <- Arima(training$log_Adj.close, order = c(2, 1, 1), include.drift = TRUE)
summary(ARIMA211)

## Series: training$log_Adj.close
## ARIMA(2,1,1) with drift
##
## Coefficients:
## ar1 ar2 ma1 drift
## -1.0395 -0.1206 0.8770 0.0123
## s.e. 0.2861 0.1372 0.2547 0.0064
##
## sigma^2 = 0.004572: log likelihood = 104.03
## AIC=-198.05 AICc=-197.24 BIC=-186.14
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.000123686 0.06549716 0.05068155 -0.01062332 1.198091 0.9332629
## ACF1
## Training set -0.00432817

coeftest(ARIMA211)

##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)

13
## ar1 -1.0394680 0.2861321 -3.6328 0.0002803 ***
## ar2 -0.1205858 0.1371987 -0.8789 0.3794480
## ma1 0.8770004 0.2547166 3.4430 0.0005752 ***
## drift 0.0122529 0.0064187 1.9090 0.0562681 .
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Since AR2 term is not significant and has highest P-value which is 0.3794480 . So, we will drop this term in
the model in next step.

### model p = 1, d = 1, q = 1
ARIMA111 <- Arima(training$log_Adj.close, order = c(1, 1, 1), include.drift = TRUE)
summary(ARIMA111)

## Series: training$log_Adj.close
## ARIMA(1,1,1) with drift
##
## Coefficients:
## ar1 ma1 drift
## -0.8314 0.7474 0.0120
## s.e. 0.4017 0.4775 0.0071
##
## sigma^2 = 0.004559: log likelihood = 103.62
## AIC=-199.23 AICc=-198.7 BIC=-189.7
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -9.764215e-05 0.06583236 0.05094031 -0.009183767 1.206142
## MASE ACF1
## Training set 0.9380277 -0.07078971

coeftest(ARIMA111)

##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.831374 0.401676 -2.0698 0.03847 *
## ma1 0.747417 0.477457 1.5654 0.11749
## drift 0.012026 0.007071 1.7008 0.08898 .
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

checkresiduals(ARIMA111)

14
Residuals from ARIMA(1,1,1) with drift

0.1

0.0

−0.1

−0.2
0 20 40 60 80

0.2
20
0.1
15
ACF

df$y
0.0
10

−0.1
5

−0.2
0
5 10 15 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1) with drift
## Q* = 9.2513, df = 8, p-value = 0.3215
##
## Model df: 2. Total lags used: 10

Since MA1 term is not significant and has highest P-value which is 0.11749. So, we will drop this term in
the model in next step.

### model p = 1, d = 1, q = 0
ARIMA110 <- Arima(training$log_Adj.close, order = c(1, 1, 0), include.drift = TRUE)
summary(ARIMA110)

## Series: training$log_Adj.close
## ARIMA(1,1,0) with drift
##
## Coefficients:
## ar1 drift
## -0.1705 0.0123
## s.e. 0.1136 0.0063
##
## sigma^2 = 0.004469: log likelihood = 103.91
## AIC=-201.82 AICc=-201.51 BIC=-194.68

15
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -9.986018e-05 0.06559776 0.05051347 -0.01029073 1.193776 0.9301678
## ACF1
## Training set -0.002278289

coeftest(ARIMA110)

##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.1705052 0.1135769 -1.5012 0.1333
## drift 0.0122775 0.0063211 1.9423 0.0521 .
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

checkresiduals(ARIMA110)

Residuals from ARIMA(1,1,0) with drift

0.1

0.0

−0.1

−0.2
0 20 40 60 80

0.2
20
0.1
15
ACF

df$y

0.0
10
−0.1
5

−0.2
0
5 10 15 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,0) with drift

16
## Q* = 7.3749, df = 9, p-value = 0.5981
##
## Model df: 1. Total lags used: 10

The model seems to be a good mode as we have small AIC value. But it is always recommended to check
ARIMA (0,1,1) also

### model p =0, d = 1, q = 1


ARIMA011 <- Arima(training$log_Adj.close, order = c(0, 1, 1), include.drift =T)
summary(ARIMA011 )

## Series: training$log_Adj.close
## ARIMA(0,1,1) with drift
##
## Coefficients:
## ma1 drift
## -0.1626 0.0123
## s.e. 0.1074 0.0062
##
## sigma^2 = 0.004473: log likelihood = 103.87
## AIC=-201.75 AICc=-201.43 BIC=-194.6
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0001226027 0.06563078 0.05059285 -0.01106166 1.195624 0.9316295
## ACF1
## Training set -0.00824293

coeftest(ARIMA011 )

##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.1626465 0.1073594 -1.5150 0.12978
## drift 0.0123500 0.0062058 1.9901 0.04658 *
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

checkresiduals(ARIMA011 )

17
Residuals from ARIMA(0,1,1) with drift

0.1

0.0

−0.1

−0.2
0 20 40 60 80

0.2
20
0.1
15
ACF

df$y
0.0
10
−0.1
5
−0.2
0
5 10 15 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1) with drift
## Q* = 7.4427, df = 9, p-value = 0.5911
##
## Model df: 1. Total lags used: 10

This model is also looks a good model. This model has also small AIC.
Step XI: AIC and BIC table for the selected good models

AIC_table <- data.frame(cbind(ARIMA010$aic, ARIMA011$aic, ARIMA110$aic, ARIMA111$aic),


row.names = "AIC")
names(AIC_table) <- c("ARIMA(0, 1, 0)", "ARIMA(0, 1, 1)", "ARIMA(1, 1, 0)",
"ARIMA(1, 1, 1)")
AIC_table

## ARIMA(0, 1, 0) ARIMA(0, 1, 1) ARIMA(1, 1, 0) ARIMA(1, 1, 1)


## AIC -201.6024 -201.7459 -201.8237 -199.2316

Repeat the same for BIC

18
BIC_table <- data.frame(cbind(ARIMA010$bic, ARIMA011$bic, ARIMA110$bic, ARIMA111$bic),
row.names = "BIC")
names(BIC_table) <- c("ARIMA(0, 1, 0)", "ARIMA(0, 1, 1)", "ARIMA(1, 1, 0)",
"ARIMA(1, 1, 1)")
BIC_table

## ARIMA(0, 1, 0) ARIMA(0, 1, 1) ARIMA(1, 1, 0) ARIMA(1, 1, 1)


## BIC -196.8384 -194.5998 -194.6776 -189.7035

2. R function for the transformation

The problem with this forecast is that it is logged. We want to compare the true closing prices of the testing
set with the forecast. First, we want to create a plot where we compare the forecasted values with confidence
intervals to the true test values. To make this easier, we will give you a short cut: a specially made function
that will take a log-price forecast from the forecast package and convert it to a regular price forecast.

log2price <- function(a.forecast){


### saves the non log transform forecasts as a time series
a.forecast$mean <- exp(a.forecast$mean)
### saves the non log transform upper limits for the 80% and 95% confidence interval
a.forecast$upper <- exp(a.forecast$upper)
### saves the non log transform lower limits for the 80% and 95% confidence interval
a.forecast$lower <- exp(a.forecast$lower)
### saves the non log transform original time series
a.forecast$x <- exp(a.forecast$x)
return(a.forecast)
}

First, we will use the function “forecast” that gives us the forecast for four periods forward

### model p = 0, d = 1, q = 0
forecast010 <- forecast(ARIMA010, h = 4)

3. Plotting the True Forecast

Then I will use the “log2price” function to turn the logged forecast into a regular forecast. Then we can
create the plot.

priceforecast010 <- log2price(forecast010)


plot(priceforecast010)
lines(82:85, testing$Adj.Close, type = "o", col = "dark red", lwd = 2)

19
Forecasts from ARIMA(0,1,0) with drift
100 120 140
80
60
40

0 20 40 60 80

I will summarize the forecasts for the last 4 points with the testing values and their difference in a table

### summarize the calculations in a table


ARIMA010_forecast <- exp(forecast010$mean)
ARIMA010_forecast_error <- testing$Adj.Close - ARIMA010_forecast
ARIMA010_table <- data.frame(cbind(testing$Adj.Close, ARIMA010_forecast, ARIMA010_forecast_error))
ARIMA010_table

## testing.Adj.Close ARIMA010_forecast ARIMA010_forecast_error


## 1 94.66 97.29982 -2.6398176
## 2 101.45 98.46339 2.9866055
## 3 88.73 99.64088 -10.9108730
## 4 99.87 100.83244 -0.9624386

Calculate RMSE for ARIMA(0, 1, 0) model using the above table.

RMSE_ARIMA010<-rmse(ARIMA010_table$testing.Adj.Close,ARIMA010_table$ARIMA010_forecast)
RMSE_ARIMA010

## [1] 5.82799

Repeat the same procedure for the other ARIMA models.


Compare RMSE of all the estimated models. The lowest RMSE corresponds to the best model

20
4. Test for ARCH effects

After selecting the best model based on RMSE, AIC and BIC, you can test the model for ARCH effects
among the residuals. To do that you will need to install the package “FinTS”

##install.packages("FinTS")

After installing the package you do not need to install the same package again on your computer and you
do not need to run any library for the command below that will give you the ARCH test

FinTS::ArchTest(ARIMA010$residuals)

##
## ARCH LM-test; Null hypothesis: no ARCH effects
##
## data: ARIMA010$residuals
## Chi-squared = 17.831, df = 12, p-value = 0.1209

Note that there are two “:” between FinTS and ArchTest. MAke sure that you write the command as above.
In case you try to run it with only one “:” you will get an Error

21

You might also like