0% found this document useful (0 votes)
27 views

Data Analysis Activity 3

The document analyzes an industrial production time series using ARIMA modeling. It finds that the time series is non-stationary and performs log-log transformation. The ACF and PACF plots suggest an AR(1) and MA(1) process. The BIC selects the ARIMA(1,1,0) model as the best fit. Model diagnostics show the residuals are barely uncorrelated with normality. The model is used to forecast industrial production for the next 12 quarters.

Uploaded by

IncreDABels
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Data Analysis Activity 3

The document analyzes an industrial production time series using ARIMA modeling. It finds that the time series is non-stationary and performs log-log transformation. The ACF and PACF plots suggest an AR(1) and MA(1) process. The BIC selects the ARIMA(1,1,0) model as the best fit. Model diagnostics show the residuals are barely uncorrelated with normality. The model is used to forecast industrial production for the next 12 quarters.

Uploaded by

IncreDABels
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Jay Kapoor

STAT 4534
Dr. Marco A. R. Ferreira
17th Nov, 2022
Data Analysis Activity 3
Task 1

 We can observe that the time series is not stationary as it changes along with time.
 We can use log-log transformation for this data set.
 There is a upward trend in the time series and it does have trends which need to de-trended
using log-log transformation.

Task 2
Plotting the sample ACF and sample PACF of the differenced time series
The possible preliminary orders p and q of an AR model is 1 as PACF cuts of at lag 1 and it seems
the time series is an AR(1) and MA(1)

Task 3

The BIC gives the best model to be (1,1,0) as suspected by us in previous task. This makes us
confident to use ARIMA(1,1,0).

 AIC favors the model ARIMA(1,1,1) and also is very close to favoring model ARIMA(1,1,0).

 BIC favors the model ARIMA(1,1,0).

 The results for AIC and BIC do not accurately coincide but it does give information about
choosing a model.

 Based on these results, I choose ARIMA(1,1,0) to have the bets fit to our model.

Task 4 - performing model diagnostics


 The ACF of the residuals do not have much auto correlation, considering that the threshold is
only crossed one time in the ACF plot.

 The assumption of normality of the residuals reasonable as most values follow the predicted
line, however, some values do not fit the line, causing a S-shape curve which can be reasoned
out as nature of the problem (industrial production)

 We observe that the p-values on the Ljung-Box are above the line or just on the line, we don’t
see any point going way below the line and we can say that the residuals are barely
uncorrelated.

Task 5 - Forecasting the model

Plotting the forecast for industrial production time series 12 quarters ahead

Prediction for the next quarters

The 95% prediction interval for the fourth quarter of 2022.11.17


Appendix

#Jay Kapoor
#Stat 4534 Time Series Analysis
#DAA 3

library(readr)
library(astsa)
library(polynom)

#Task 1
le <- read_csv("D:/STAT 4534 time series/Analysis Activity 3/IPB50001SQ-1980-2019.csv")
industrial.production <- ts(le[,2],start=1980,frequency=4)
tsplot(industrial.production)

log.ip <- log(industrial.production)


tsplot(log.ip)

#Removing the trend


diff.log.ip <- diff(log.ip)
tsplot(diff.log.ip)

#Task2

acf2(diff.log.ip,max.lag=40)

# Task 3

n = length(log.ip)
max.p = 5
max.d = 1
max.q = 5
max.P = 0
max.D = 0
max.Q = 0
BIC.array =array(NA,dim=c(max.p+1,max.d+1,max.q+1,max.P+1,max.D+1,max.Q+1))
AIC.array =array(NA,dim=c(max.p+1,max.d+1,max.q+1,max.P+1,max.D+1,max.Q+1))
best.bic <- 1e8

x.ts = log.ip
for (p in 0:max.p) for(d in 0:max.d) for(q in 0:max.q)
for (P in 0:max.P) for(D in 0:max.D) for(Q in 0:max.Q)
{
# This is a modification of a function originally from the book:
# Cowpertwait, P.S.P., Metcalfe, A.V. (2009), Introductory Time
# Series with R, Springer.
# Modified by M.A.R. Ferreira (2016, 2020).
cat("p=",p,", d=",d,", q=",q,", P=",P,", D=",D,", Q=",Q,"\n")

fit <- tryCatch(


{ arima(x.ts, order = c(p,d,q),
seas = list(order = c(P,D,Q),
frequency(x.ts)),method="CSS-ML")
},
error = function(cond){
message("Original error message:")
message(cond)
# Choose a return value in case of error
return(NA)
}
)
if(!is.na(fit)){
number.parameters <- length(fit$coef) + 1
BIC.array[p+1,d+1,q+1,P+1,D+1,Q+1] = -2*fit$loglik + log(n)*number.parameters
AIC.array[p+1,d+1,q+1,P+1,D+1,Q+1] = -2*fit$loglik + 2*number.parameters

if (BIC.array[p+1,d+1,q+1,P+1,D+1,Q+1] < best.bic)


{
best.bic <- BIC.array[p+1,d+1,q+1,P+1,D+1,Q+1]
best.fit <- fit
best.model <- c(p,d,q,P,D,Q)
}

}
}

best.bic
best.fit
best.model

#Task4

arima211 <- arima(x.ts, order = c(2,1,1), method="CSS-ML")


BIC.arima211 = -2*arima211$loglik + log(n)*(length(arima211$coef) + 1)
BIC.arima211
arima111 <- arima(x.ts, order = c(1,1,1), method="CSS-ML")
BIC.arima111 = -2*arima111$loglik + log(n)*(length(arima111$coef) + 1)
BIC.arima111
arima110 <- arima(x.ts, order = c(1,1,0), method="CSS-ML")
BIC.arima110 = -2*arima110$loglik + log(n)*(length(arima110$coef) + 1)
BIC.arima110

AIC(arima211)
AIC(arima111)
AIC(arima110)

# So, the best model according to BIC is the ARIMA(1,1,0)


fit.110 <- sarima(diff.log.ip,1,1,0)

#Task 5

par(mfrow=c(1,1))
ip.pred <- sarima.for(log.ip,12,1,1,0)
ip.pred$pred

ip.pred$se

exp(ip.pred$pred-1.96* ip.pred$se)
exp(ip.pred$pred+1.96* ip.pred$se)

You might also like