0% found this document useful (0 votes)
82 views

STAT 497 - Old Exams2

Uploaded by

wang.dishen0118
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

STAT 497 - Old Exams2

Uploaded by

wang.dishen0118
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 45

Dr. Ceylan Yozgatlıgil 26.11.

2013
METU-FALL 2013
STAT 497
MIDTERM EXAM

QUESTIONS (Each question is 20 pts.)

1. Consider the following processes where it is assumed that ’s are White Noise with

zero mean and variance. For each of the processes,


i. state the name of the process, and
ii. examine the stationarity and invertibility of the procceses. If they are not
stationary, try to make them stationary.
a)
b)
c)
2. For the following processes
i. ARIMA(1,2,0)
ii. The multiplicative seasonal ARIMA, SARIMA(0,0,1)(0,2,2)4

a) Write the backshift operator form.


b) Obtain the random shock form and the inverted form, indicating the first four
weights.
c) Can you calculate the autocorrelation function (ACF) of these processes
directly? If not, explain why and take a necessary action to find ACF.
d) Calculate the ACF function of the processes.

1
3. Identify appropriate time series models for the data from the following sample
autocorrelations assuming exponentially decayed partial autocorrelations.
a) n=56, data
k 1 2 3 4 5 6 7 8 9 10
0.92 0.83 0.81 0.80 0.71 0.63 0.60 0.58 0.50 0.42 1965.6 373.6

0.05 0.86 0.04 0.79 0.02 0.12 0.00 0.78 0.07 0.05 22.1 102.9

0.40 0.11 0.43 0.61 0.22 0.15 0.26 0.15 0.01 0.10 0.16 53.77

b) n=100, data
k 1 2 3 4 5 6 7 8 9 10
0.97 0.94 0.91 0.88 0.85 0.82 0.79 0.76 0.73 0.70 102.95 57.90

0.12 0.09 0.09 0.03 0.11 0.07 0.02 0.07 0.09 0.18 2.00 1.03

4. Yearly number of licensed vehicles in UK is obtained from 1929 to 2012. Please answer
the following questions regarding this series.

a) What features of the data are evident in the time series plots and related statistics that
should be captured by a model? Is there other evidence that would be useful in
formulating a modeling strategy?

2
b) After examining the following test results, what are the necessary steps that we need
to consider as a next step?

> kpss.test(accident,null=c("Level"))

KPSS Level = 2.977, Truncation lag parameter = 2, p-value = 0.01

> kpss.test(accident,null=c("Trend"))
KPSS Trend = 0.4637, Truncation lag parameter = 2, p-value = 0.01

> pp.test(accident)
Dickey-Fuller Z(alpha) = -5.8212, Truncation lag parameter = 3,
p-value = 0.777

> kpss.test(diff(accident),null=c("Level"))

KPSS Level = 0.6669, Truncation lag parameter = 2, p-value = 0.01656

> kpss.test(diff(accident),null=c("Trend"))
KPSS Trend = 0.1622, Truncation lag parameter = 2, p-value = 0.03647

> pp.test(diff(accident))

Dickey-Fuller Z(alpha) = -50.1856, Truncation lag parameter = 3,


p-value = 0.01

c) After having a stationary process, the time series plots and ACF/PACF plots of the series
given below. Interpret these plots. Identify an appropriate ARIMA(p,d,q) model for this
series.

3
> eacf(accident)
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x x x x x x x x x x x x x x
1 x x o o o o o x o o o o o o
2 x o o o o o o o o o o o o o
3 o x o o o o o o o o o o o o
4 x x o o o o o o o o o o o o
5 o x o o o o o o o o o o o o
6 o o o o o o o o o o o o o o
7 x o o x o o o o o o o o o o

> eacf(diff(accident))
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x x x o o o o o o o o o o o
1 x o o o o o o o o o o o o o
2 o o o o o o o o o o o o o o
3 x o o o o o o o o o o o o o
4 x o o o o o o o o o o o o o
5 x x o o o o o o o o o o o o
6 x o o o o o o o o o o o o o
7 x o x x o x o o o o o o o o

4
5. Monthly rate of total private hires from January 2001 to August 2013 is collected from
U.S. Department of Labor: Bureau of Labor Statistics to analyze.

a) What features of the data are evident in the time series plots and related statistics that
should be captured by a model? Is there other evidence that would be useful in
formulating a modeling strategy?

b) Interpret the below outputs and discuss what should be the next step. Explain each
output separately.
> HEGY.test(wts =hires, itsd= c(1, 0,c(0)), regvar = 0)

HEGY statistics:

Stat. p-value
tpi_1 -0.703 0.1
tpi_2 -0.009 0.1
Fpi_3:4 1.686 0.1
Fpi_5:6 1.038 0.1
Fpi_7:8 0.096 0.1
Fpi_9:10 0.634 0.1
Fpi_11:12 0.007 0.1
Fpi_2:12 0.647 NA
Fpi_1:12 0.637 NA
> HEGY.test(wts =diff(hires), itsd = c(0, 0,c(0)), regvar = 0)

HEGY statistics:

Stat. p-value
tpi_1 -1.885 0.051
tpi_2 -0.479 0.100

5
Fpi_3:4 0.381 0.100
Fpi_5:6 1.205 0.100
Fpi_7:8 0.121 0.100
Fpi_9:10 0.942 0.100
Fpi_11:12 0.022 0.100
Fpi_2:12 0.508 NA
Fpi_1:12 0.771 NA

> HEGY.test(wts =diff(diff(hires,12)), itsd = c(0, 0,c(0)), regvar = 0)

HEGY statistics:

Stat. p-value
tpi_1 -3.166 0.01
tpi_2 -3.702 0.01
Fpi_3:4 20.876 0.01
Fpi_5:6 9.334 0.01
Fpi_7:8 21.211 0.01
Fpi_9:10 14.657 0.01
Fpi_11:12 19.293 0.01
Fpi_2:12 21.174 NA
Fpi_1:12 22.971 NA

c) As a result of the operation that you suggest in part b), the series Zt is obtained and
the time series plot, correlogram and sample PACF plot are given below.

i) Interpret the time series plot and discuss the existence of any strange behavior.

ii) Which ARIMA(p,d,q) or SARIMA(p,d,q)(P,D,Q)s process(es) is(are) suitable for this


series? Why?

6
MIDTERM EXAM I SPRING 2014

1. Consider the following process

where it is assumed that {at}’s are White Noise with zero mean and variance of 2.

a) (5 pts.) Write the process in backshift operator form.


b) (10 pts.) Find the mean and the variance of the process.
2. Consider the process

where it is assumed that {at}’s are White Noise with zero mean and variance.

a) (5 pts.) Is the process stationary and/or invertible. Show.


b) (15 pts.) Find the autocorrelation function (ACF) of the process, if possible. If it is
not possible, explain why and try to find ACF after applying some operations.
3. Consider , where represents a trend component satisfying the model

with i.i.d. , and is stationary satisfying

with i.i.d. ; are independent.

a) (15 pts.) Show that follows an ARIMA (p, d, q) model, and determine the values of
p, d and q.
b) (5 pts.) What will the orders p, d, q be if and ?

4. Contains normalized tree-ring widths in dimensionless units. The data were recorded by
Donald A. Graybill, 1980, from Gt Basin Bristlecone Pine 2805M, 3726-11810 in
Methuselah Walk, California. A univariate time series with 7981 observations. Each tree
ring corresponds to one year. The series starts from 6000 BC to 1979 AC.

7
5. (5 pts.) What features of the data are evident in the time series plots and related statistics
that should be captured by a model? Is there other evidence that would be useful in
formulating a modeling strategy?
Yearly Treerings from -6000 to 1979 Series treering Series treering

0.20
0.20
1.5

0.15
0.15

Partial ACF
1.0

0.10
ACF

0.10

0.05
0.05
0.5

0.00
0.00
0.0

-6000 -4000 -2000 0 2000 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35

Time Lag Lag

6. (5 pts.) In the light of following test results, what are the necessary steps that we need to
consider next?

> kpss.test(treering,null=c('Level'))

KPSS Level = 0.0865, Truncation lag parameter = 20, p-value = 0.1

> kpss.test(treering,null=c('Trend'))

KPSS Trend = 0.0676, Truncation lag parameter = 20, p-value = 0.1

c) (10 pts.) After having a stationary process, the time series plots and ACF/PACF plots of
the series given below. Interpret these plots. Identify an appropriate ARIMA(p,d,q) model for
this series.

8
The extended sample autocorrelation function
AR/MA
0 1 2 3 4 5 6 7 8 9 10
0 x x x x x x x x x x o
1 x x o o o x o x x x o
2 x o x o o o o o o x o
3 x x o o x o o o o x o
4 x x x x x o o o o x o
5 x x x x x o o o o x o
6 x x x x x o o o o o o
7 x x x x x x x o o o o
8 o x x x x o x x o o o
9 o x x x x x x x x o o
10 x x x x x x x x x x o
> armaselect(treering, max.p= 15, max.q=15, nbmod = 10)
p q sbc
[1,] 2 3 -19662.30
[2,] 3 2 -19660.68
[3,] 2 2 -19659.00
[4,] 3 3 -19653.84
[5,] 2 4 -19653.56
[6,] 4 2 -19651.72
[7,] 4 3 -19650.56
[8,] 3 4 -19648.96
[9,] 1 1 -19648.56
[10,] 2 5 -19647.99

9
7. We want to analyze the annual volume of foreign trade in Turkey from 1923 to 2013.
a) (5 pts.) What features of the data are evident in the time series plots and related statistics
that should be captured by a model? Is there other evidence that would be useful in
formulating a modeling strategy?

b) (10 pts.) In the light of following test results, what are the necessary steps that we need
to consider next?
> kpss.test(trade,null=c('Level'))

KPSS Level = 1.5906, Truncation lag parameter = 2, p-value = 0.01

> kpss.test(trade,null=c('Trend'))

KPSS Trend = 0.5148, Truncation lag parameter = 2, p-value = 0.01

c) (10 pts.) To obtain a stationary series, differencing operation is applied to the series and
following output and plots are obtain. Interpret the plots. Do you think the differencing
operation solves the problem? Explain.

> kpss.test(diff(trade),null=c('Level'))
KPSS Level = 1.1136, Truncation lag parameter = 2, p-value = 0.01

> kpss.test(diff(trade),null=c('Trend'))
KPSS Trend = 0.2729, Truncation lag parameter = 2, p-value = 0.01

10
> kpss.test(diff(diff(trade)),null=c('Level'))
KPSS Level = 0.0225, Truncation lag parameter = 2, p-value = 0.1

> kpss.test(diff(diff(trade)),null=c('Trend'))
KPSS Trend = 0.021, Truncation lag parameter = 2, p-value = 0.1

First Order Differenced Volume of Trade in Turkey from 1923 to 2013 Series diff(trade) Series diff(trade)

0.3

0.3
5e+07

0.2

0.2
0.1

0.1
0e+00

Partial ACF
ACF

0.0

0.0
-5e+07

-0.1

-0.1
-0.2

-0.2
1940 1960 1980 2000 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35

Time Lag Lag

Second Order Differenced Volume of Trade in Turkey from 1923 to 2013 Series diff(diff(trade)) Series diff(diff(trade))
1.5e+08

0.2
0.2
1.0e+08

0.1

0.0
5.0e+07

0.0

Partial ACF
0.0e+00

-0.1
ACF

-0.2
-0.2
-5.0e+07

-0.4
-0.3
-1.0e+08

-0.4
-1.5e+08

1940 1960 1980 2000 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35

Time Lag Lag

11
MIDTERM EXAM FALL 2014

1. Suppose that Xt follows the model (1 − 0.87B + 0.27B2)Xt = at, where at is a white noise
with variance 2=4, and X100 = 3.0 and X99 = −1.0.
a) What is the 2-step ahead forecast of X102 at the forecast origin t = 100?
b) What is the variance of the associated forecast error? Find 95% prediction limit for
X102.
c) If the value of X101 is obtained as 2.0 , what is the updated forecast for X102.

2. For the following processes

where at is a White Noise r.v. with 0 mean and variance 1.


e) Write the backshift operator form and find the roots of the AR and MA
polynomials.
f) Can you calculate the random shock form and the inverted form of this
process directly? If not, explain why and take a necessary action to obtain the
random shock form and the inverted form, indicating the first four weights.
g) Can you calculate the autocorrelation function (ACF) of Yt process directly? If
not, explain why and take a necessary action to find ACF.
h) Calculate the ACF function of the processes.

3. If the autocorrelation funtion of a process is obtained as where


s is the seasonal period, find the 1-step ahead and 2-step ahead forecast of the process
with their forecast error variances.
4. Consider the monthly US Retail Prices for Sporting Goods, Hobby, Book an Music
Stores from January 1992 to October 2014. Please answer the following questions
regarding this series.

c) What features of the data are evident in the time series plots and related statistics that
should be captured by a model? Is there other evidence that would be useful in
formulating a modeling strategy?

12
Monthly US Retail Prices for US Retail Prices for Sporting Goods, Hobby, Book an Music Stores from Jan 1992 to Oct 2014 X3877 Series rm

1.0
1.0
7000

0.8
0.8

0.6
0.6
6000

Partial ACF
ACF
$

0.4
0.4

0.2
5000

0.2

0.0
0.0
4000

1995 2000 2005 2010 2015 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Time Lag Lag

d) After examining the following test results, what are the necessary steps that we need
to consider as a next step?

> kpss.test(rm,null=c("Level"))

KPSS Level = 6.0508, Truncation lag parameter = 3, p-value = 0.01

> kpss.test(rm,null=c("Trend"))

KPSS Trend = 1.4286, Truncation lag parameter = 3, p-value = 0.01

> pp.test(rm)

Dickey-Fuller Z(alpha) = -10.4911, Truncation lag parameter = 5,


p-value = 0.5218

> ndiffs(rm, alpha=0.05, test=c("adf"), max.d=2)


[1] 1
> nsdiffs(rm, m=frequency(rm), test=c("ocsb"), max.D=2)
[1] 0

c) After having a stationary process, the time series plots and ACF/PACF plots of the series
given below. Interpret these plots. Identify an appropriate ARIMA(p,d,q) model for this
series.

13
Time Series Plot of the First Ordered Differenced Series Series as.vector(diff(rm)) Series as.vector(diff(rm))
600

0.1
400

0.1
200

0.0
0.0
0

Partial ACF
ACF
$

-0.1
-200

-0.1
-400

-0.2
-0.2
-600
-800

1995 2000 2005 2010 2015 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35

Time Lag Lag

d) Before ARIMA estimation and forecasting, we want to see the exponential smoothing forecast
of this dataset. What can you say about the exponential smoothing mode and its forecast plot?

>fit=ets(rm)

ETS(M,A,N)

Call:
ets(y = rm)

Smoothing parameters:
alpha = 0.5911
beta = 1e-04

Initial states:
l = 4011.5994
b = 15.4235

sigma: 0.018

AIC AICc BIC


4106.384 4106.533 4120.822

> plot(forecast(fit,level=c(95)))

14
8000
7000
6000
5000
4000
Forecasts from ETS(M,A,N)

1995 2000 2005 2010 2015

5. The monthy average temperature of Zonguldak station is anayzed from January 2000 to
December 2010.

a) What features of the data are evident in the time series plots and related statistics that
should be captured by a model? Is there other evidence that would be useful in
formulating a modeling strategy?

b) The following model fit is obtained for the series. Please interpret the results.

15
> tt=time(t)-2000
> Q = factor(rep(1:12,11))
> reg = lm(t~Q+tt, na.action=NULL)
> summary(reg)

Call:
lm(formula = t ~ Q + tt, na.action = NULL)

Residuals:
Min 1Q Median 3Q Max
-3.9570 -0.9226 -0.0070 0.6977 4.1770
Coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept) 3.65152 0.45445 8.035 7.65e-13 ***
Q2 0.82662 0.58467 1.414 0.16003
Q3 3.33505 0.58470 5.704 8.70e-08 ***
Q4 6.99803 0.58474 11.968 < 2e-16 ***
Q5 11.41556 0.58480 19.521 < 2e-16 ***
Q6 15.86035 0.58487 27.118 < 2e-16 ***
Q7 18.64152 0.58497 31.868 < 2e-16 ***
Q8 18.52268 0.58508 31.659 < 2e-16 ***
Q9 14.20384 0.58520 24.272 < 2e-16 ***
Q10 9.91227 0.58535 16.934 < 2e-16 ***
Q11 5.26616 0.58551 8.994 4.49e-15 ***
Q12 1.35641 0.58568 2.316 0.02227 *
tt 0.11697 0.03774 3.099 0.00242 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.371 on 119 degrees of freedom


Multiple R-squared: 0.9626, Adjusted R-squared: 0.9588
F-statistic: 255.4 on 12 and 119 DF, p-value: < 2.2e-16

Figure: The plot of the original and predicted monthy average temperature series

16
FINAL EXAM FALL 2014

1. Please answer the following questions.


a) Why do we require that a time series be stationary in order to apply ARMA modelling?
b) If our goal is only to produce a long run forecast, is there any advantage to using an
ARMA model?
c) Describe what we mean by a trend stationary time series.
d) Describe what we mean by a difference stationary time series.

2. Consider the process

where
a) Compute the autocorrelation function (ACF) of this process.
b) Is there anything unusual about this ACF?
c) Given information through date n, compute point forecasts for yn+1 and yn+2 with their
forecast error variances.
3. From a series of length 100, we have computed r = 0:8; r = 0:4; r =
1 2 3

0:2, and 0=1.


a) Assume that an ARMA(1,2) model: is appropriate
for this data set. Find an estimator of , using the method of
moment.
b) Assuming the same values given in the question with mean 1,
consider an AR(1) model is appropriate for this data set. Find a
Yule-Walker estimator for the parameter (No need to estimate the
error variance).

4. Monthly Sugar price indices (2002-2004=100) from January 1990 to April 2002 is
analyzed.

17
a) What features of the data are evident in the time series plots and related statistics that
should be captured by a model? Is there other evidence that would be useful in
formulating a modeling strategy?

b) Interpret the below outputs and discuss what should be the next step.

> kpss.test(sugar,null=c("Level"))

KPSS Level = 2.2245, Truncation lag parameter = 3, p-value = 0.01

> kpss.test(sugar,null=c("Trend"))

KPSS Trend = 1.0755, Truncation lag parameter = 3, p-value = 0.01

> pp.test(sugar)

Dickey-Fuller Z(alpha) = -10.536, Truncation lag parameter = 5, p-value= 0.5192


> ndiffs(sugar, alpha=0.05, test=c("adf"), max.d=2)
[1] 1

> nsdiffs(sugar, m=frequency(sugar), test=c("ocsb"), max.D=2)


[1] 0

18
c) After the first ordered difference, we obtain the following plots:

Based on these plots, please identify an appropriate ARIMA or SARIMA model with orders.
Explain your selection procedure briefly.

d) You can see six different fitted models for the series below. Interpret the results and
choose one model as the best one for this time series. Explain the reasoning.
> fit1=arima(sugar,order=c(2,1,1),seasonal=list(order=c(1,0,1), period=12))

Coefficients:

ar1 ar2 ma1 sar1 sma1

1.2912 -0.421 -0.8979 0.8443 -0.7281

s.e. 0.0905 0.056 0.0848 0.1143 0.1375

sigma^2 estimated as 172.3: log likelihood=-1063.02

AIC=2136.04 AICc=2136.37 BIC=2157.54

> fit2=arima(sugar,order=c(2,1,7),seasonal=list(order=c(1,0,1), period=12))

Coefficients:

ar1 ar2 ma1 ma2 ma3 ma4 ma5 ma6

1.1896 -0.8227 -0.8128 0.3902 0.2469 0.0184 0.0805 -0.1263

s.e. 0.0537 0.0522 0.0797 0.0899 0.0874 0.0902 0.0991 0.0923

19
ma7 sar1 sma1

-0.1168 -0.1647 0.2259

s.e. 0.0739 NaN NaN

sigma^2 estimated as 163.5: log likelihood=-1058.04

AIC=2138.09 AICc=2139.32 BIC=2181.09

> fit3=arima(sugar,order=c(2,1,0),seasonal=list(order=c(1,1,1), period=12))

Coefficients:

ar1 ar2 sar1 sma1

0.4218 -0.0927 -0.0210 -0.8039

s.e. 0.0627 0.0633 0.0825 0.0523

sigma^2 estimated as 185.3: log likelihood=-1030.11

AIC=2068.21 AICc=2068.46 BIC=2085.9

> fit4=arima(sugar,order=c(1,1,0),seasonal=list(order=c(0,1,1), period=12))

Coefficients:

ar1 sma1

0.3850 -0.8041

s.e. 0.0577 0.0427

sigma^2 estimated as 187.2: log likelihood=-1031.19

AIC=2066.37 AICc=2066.47 BIC=2076.98

> fit5=arima(sugar,order=c(2,1,1),seasonal=list(order=c(1,1,0), period=12))

Coefficients:

ar1 ar2 ma1 sar1

0.1120 0.0288 0.3203 -0.4687

s.e. 0.4621 0.2033 0.4564 0.0594

20
sigma^2 estimated as 241.1: log likelihood=-1058.59

AIC=2125.19 AICc=2125.43 BIC=2142.87

> fit6=arima(sugar,order=c(1,1,1),seasonal=list(order=c(0,1,1), period=12))

Coefficients:
ar1 ma1 sma1
0.1987 0.2218 -0.8096
s.e. 0.1505 0.1499 0.0428

sigma^2 estimated as 185.4: log likelihood=-1030.18


AIC=2066.35 AICc=2066.51 BIC=2080.5

e) After deciding on the best model, we checked the diagnostics of the model. Please
interpret each output. If assumptions are not satisfied, suggest a method to solve the
problem.

21
> shapiro.test(window(rstandard(fit),start=c(1990,1)))

W = 0.897, p-value = 1.609e-12

> jarque.bera.test(rstandard(fit))

X-squared = 840.8252, df = 2, p-value < 2.2e-16

> Box.test(rstandard(fit),lag=15,type = c("Ljung-Box"))

X-squared = 21.9062, df = 15, p-value = 0.1103

> Box.test(rstandard(fit),lag=15,type = c("Box-Pierce"))

X-squared = 21.0445, df = 15, p-value = 0.1354

f) Squared residuals are obtained and ACF and PACF plots are obtained as follows.
Why are looking at the squared residuals? What these graphs tell us?

g) We obtained the following forecasts and prediction limits. What can you say about
these forecasts and prediction limits? Are they plausible?

22
5. Consider two-dimensional vector AR(1) processes

i. Is the series I(1) or I(0)? Show.


ii. Is Yt and/or Xt cointegrated? If so, find the cointegrating vector for the series.
iii. Write the processes in error correction representation form if they are cointegrated.

6. Consider the quarterly seasonally adjusted real gross domestic product, real imports of
goods and services and real exports of goods and services of U.S. from Q1-1947 to Q3-
2014.

23
a) What is the problem when nonstationary series are used in regression analysis?
b) Explain the logic of Granger Causality.
c) We perform a Granger-causality test and results are given in the below table. Analyze the
results of Granger-causality Wald test at 5% significance level. Discuss whether the
results are reasonable or not?

F df p-value

export import 7.86 4 0.000

export gdp 3.41 4 0.009

import export 4.06 4 0.003

import gdp 8.97 4 0.000

gdp export 2.86 4 0.002

gdp import 11.44 4 0.000

MIDTERM EXAM 2016

1. Please answer the following questions.


a) (5 pts.) Explain the meaning of stationarity and why we need stationarity in the time
series analysis?
b) (5 pts.) If there is a unit root in the MA polynomial, what could be the possible
reason of this?
c) (5 pts.) What is the asymptotic distribution of the unit root test statistic for ADF test?
Why we have three different versions of it? Please explain.
d) (5 pts.) What type of tools do we have to identify the order of ARMA processes?
Explain how we choose the most appropriate model(s) for our series?

24
2. Consider the following process

where it is assumed that {at}’s are White Noise with zero mean and variance.

c) (4 pts.) Write the process in backshift operator form.


d) (5 pts.) Is the process stationary and/or invertible?
e) (5 pts.) Find the mean and the variance of the process.
f) (8 pts.) Write the model in random shock form (You can find the first 3 of them only).
g) (8 pts.) Write the model in inverted form (You can find the first 3 of them only).

3. Consider the model where ’s are WN


with mean 0 and .

a) (5 pts.) Is the process stationary and/or invertible? Explain.


b) (10 pts.) Can you calculate the autocorrelation function (ACF) of these processes
directly? If not, explain why and take a necessary action to find ACF and then calculate
the ACF function of the processes.

4. Annual data on U.S. primary and total energy consumption by end-use


sector (residential, commercial, industrial, transportation) and electric
power sector is collected from 1949 to 2015. Our interest is the total energy
consumed by the transportation sector. Data reported in trillion btu. The mean of the
series is obtained as 19268.16.

6. (5 pts.) What features of the data are evident in the time series plots and related statistics
that should be captured by a model? Is there other evidence that would be useful in
formulating a modeling strategy?

25
7. (10 pts.) In the light of following test results, what are the necessary steps that we need to
consider next?

> kpss.test(energy,null=c('Level'))

KPSS Level = 3.3434, Truncation lag parameter = 1, p-value = 0.01

> kpss.test(energy,null=c('Trend'))

KPSS Trend = 0.39927, Truncation lag parameter = 1, p-value = 0.01

> adfTest(energy, lags = 1, type = c("nc"))

STATISTIC:
Dickey-Fuller: 2.179
P VALUE:
0.99

> adfTest(energy, lags = 1, type = c("c"))

STATISTIC:
Dickey-Fuller: -1.0895
P VALUE:
0.6501

> adfTest(energy, lags = 1, type = c("ct"))

STATISTIC:
Dickey-Fuller: -1.8473
P VALUE:
0.6371

26
> kpss.test(diff(energy),null=c('Level'))

KPSS Level = 0.22009, Truncation lag parameter = 1, p-value = 0.1

> kpss.test(diff(energy),null=c('Trend'))

KPSS Trend = 0.080175, Truncation lag parameter = 1, p-value = 0.1

The mean of the differenced series is 297.1021.

> adfTest(diff(energy), lags = 1, type = c("nc"))


STATISTIC:
Dickey-Fuller: -3.4928
P VALUE:
0.01

> adfTest(diff(energy), lags = 1, type = c("c"))


STATISTIC:
Dickey-Fuller: -4.5628
P VALUE:
0.01

> adfTest(diff(energy), lags = 1, type = c("ct"))


STATISTIC:
Dickey-Fuller: -4.5958
P VALUE:
0.01

c) (5 pts.) After having a stationary process, the time series plots and ACF/PACF plots of the
series given below. Interpret these plots. Identify an appropriate ARIMA(p,d,q) model for
this series.

27
> eacf(diff(energy))
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x o o o o o o o o o o o o o
1 o o o o o o o o o o o o o o
2 o o o o o o o o o o o o o o
3 o x o o o o o o o o o o o o
4 o x o o o o o o o o o o o o
5 x o x o o o o o o o o o o o
6 o o x o o o o o o o o o o o
7 x o o x o o o o o o o o o o

> armaselect(de, max.p = 15, max.q = 15, nbmod = 10)


p q sbc
[1,] 1 0 805.6471
[2,] 2 0 810.4900
[3,] 0 0 811.7757
[4,] 3 0 814.6980
[5,] 4 0 819.6369
[6,] 0 1 820.9781
[7,] 15 15 822.5019
[8,] 1 1 822.6152
[9,] 15 14 823.0902
[10,] 5 0 823.6761

5. We would like to analyze the series Y.

a) (5 pts.) Interpret the following plots.

b) (5 pts.) Interpret the following outputs.


> kpss.test(y,null=c('Level'))

KPSS Level = 5.0961, Truncation lag parameter = 3, p-value = 0.01

28
> kpss.test(y,null=c('Trend'))

KPSS Trend = 0.013012, Truncation lag parameter = 3, p-value = 0.1

c) (5 pts.) Explain the reason of the following modeling type and interpret all the results.
>reg = lm(y~t, na.action=NULL)

>summary(reg)

Residuals:

Min 1Q Median 3Q Max

-4.3382 -0.9663 0.1070 1.0498 3.7859

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 100.10005 0.21558 464.3 <2e-16 ***

t -8.00046 0.00186 -4301.3 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.519 on 198 degrees of freedom

Multiple R-squared: 1, Adjusted R-squared: 1

F-statistic: 1.85e+07 on 1 and 198 DF, p-value: < 2.2e-16

Figure: Plot of the original and predicted series

29
Figure: Plots for the residual series

FINAL EXAM 2017

1. A time series of n = 200 observations gave the following sample ACF and sample PACF
for the original data,
Lag k 1 2 3 4 5 6 7 8 9 10

0.901 0.732 0.659 0.695 0.728 0.676 0.584 0.541 0.560 0.571

0.901 -0.422 0.652 0.039 -0.095 -0.037 0.071 0.077 -0.054 -0.022

with the sample mean and sample variance and for the first differenced
data we have
Lag k 1 2 3 4 5 6 7 8 9 10

30
0.362 -0.491 -0.568 0.027 0.453 0.233 -0.253 -0.320 0.019 0.256

0.362 -0.715 -0.011 0.072 0.023 -0.067 -0.086 0.076 -0.039 -0.021

with mean -0.01 and variance 1.60.


a) Identify the model from the given information.
b) Obtain the Yule-Walker estimates of the parameters. Also, obtain estimates for the error
variance.
2. Consider the following multiplicative seasonal ARIMA model

where ’s are WN with mean 0 and .


a) Find the mean of the process.
b) If we have 100 data points, obtain first three-step ahead minimum MSE
forecasts for , i.e., , using the following values:

Y94=22 Y95=20 Y96=19 Y97=23 Y98=18 Y99=20 Y100=21

a94=0.2 a95=0.2 a96=0.15 a97=0.5 a98= -0.3 a99= 0.2 a100=


0.01

c) Calculate the forecast error variances for , and construct 95%


prediction intervals for
d) Assume that one quarter passed and we obtained the value of Y101 as 19. Please update the
forecast for Y102.

31
3. The industrial production (IP) index measures the real output of all relevant establishments
located in the United States, regardless of their ownership, but not those located in U.S.
territories. Monthly Industrial Production: Electric and gas utilities (Index 2012=100) series is
analyzed considering the time period Jan1972-Nov2016. We kept last 11 observations out of
analysis to see the forecasting performance.

e) What features of the data are evident in the time series plots and related statistics
that should be captured by a model? Is there other evidence that would be useful in
formulating a modeling strategy?

f) Interpret the below outputs and discuss what should be the next step. Explain each output
separately.

HEGY.test(wts=elgas, itsd=c(1,0,c(0)),regvar=0, selectlags=list(mode="aic",


Pmax=12))

$stats
Stat. p-value
tpi_1 -1.341637 0.10
tpi_2 -7.343211 0.01
Fpi_3:4 45.754869 0.01
Fpi_5:6 43.150266 0.01

32
Fpi_7:8 44.970765 0.01
Fpi_9:10 59.857036 0.01
Fpi_11:12 37.151564 0.01
Fpi_2:12 84.145311 NA

HEGY.test(wts =diff(elgas), itsd = c(0, 0,c(0)), regvar = 0)

$stats
Stat. p-value
tpi_1 -3.434966 0.01
tpi_2 -4.792731 0.01
Fpi_3:4 32.020651 0.01
Fpi_5:6 24.134387 0.01
Fpi_7:8 31.267142 0.01
Fpi_9:10 12.350985 0.01
Fpi_11:12 46.627446 0.01
Fpi_2:12 42.161938 NA
Fpi_1:12 45.666588 NA

HEGY.test(wts =diff(diff(elgas),12), itsd = c(0, 0,c(0)), regvar = 0)


$stats
Stat. p-value
tpi_1 -11.542968 0.01
tpi_2 -7.320834 0.01
Fpi_3:4 49.778707 0.01
Fpi_5:6 44.268949 0.01
Fpi_7:8 60.179290 0.01
Fpi_9:10 57.626890 0.01
Fpi_11:12 78.680526 0.01
Fpi_2:12 56.108693 NA
Fpi_1:12 53.318223 NA

g) As a result of the operation that you suggest in part b), the series Zt is obtained and the time
series plot, correlogram and sample PACF plot are given below.
iii) Interpret the time series plot and discuss the existence of any strange behavior.
iv)Which ARIMA(p,d,q) or SARIMA(p,d,q)(P,D,Q)s process(es) is(are) suitable for this
series? Why?

33
h) Some estimation results are given below for industrial production of electric and gas. Please
interpret each output and suggest only one model as the best one describing the behavior of
the series.
> fit1=arima(elgas,order=c(4,1,2),seasonal=list(order=c(2,0,2), period=12))
> fit1

Coefficients:
ar1 ar2 ar3 ar4 ma1 ma2 sar1 sar2
1.3432 -0.7825 0.1444 -0.2356 -1.6517 0.9668 0.4818 -0.4319
s.e. 0.0483 0.0741 0.0734 0.0442 0.0232 0.0233 0.5002 0.2501
sma1 sma2
-0.5742 0.2800
s.e. 0.5321 0.3398

sigma^2 estimated as 2.029: log likelihood = -936.07, aic = 1894.13

> fit2=arima(elgas,order=c(1,1,3),seasonal=list(order=c(2,0,1), period=12))


> fit2

Coefficients:
ar1 ma1 ma2 ma3 sar1 sar2 sma1
-0.3852 0.1204 -0.3327 -0.0226 0.1256 -0.2342 -0.224
s.e. 0.3544 0.3555 0.0943 0.0965 0.1304 0.0464 0.132

sigma^2 estimated as 2.124: log likelihood = -947.2, aic = 1910.4

> fit3=arima(elgas,order=c(4,1,2),seasonal=list(order=c(2,1,1), period=12))


> fit3

34
Coefficients:
ar1 ar2 ar3 ar4 ma1 ma2 sar1 sar2
0.2834 -0.0600 0.1033 -0.0929 -0.5770 -0.1352 -0.1144 -0.2548
s.e. 0.5029 0.3015 0.0753 0.0944 0.5048 0.4480 0.0457 0.0462
sma1
-0.9888
s.e. 0.0973

sigma^2 estimated as 2.046: log likelihood = -939.87, aic = 1899.74

> fit4= arima(elgas,order=c(2,1,4),seasonal=list(order=c(3,0,1), period=12))


> fit4

Coefficients:
ar1 ar2 ma1 ma2 ma3 ma4 sar1 sar2
-0.4003 -0.7678 0.1457 0.4329 -0.2111 -0.2122 -0.9353 -0.2986
s.e. 0.2413 0.1862 0.2410 0.1629 0.1088 0.0533 0.1251 0.0627
sar3 sma1
-0.2597 0.8663
s.e. 0.0472 0.1240

sigma^2 estimated as 2.097: log likelihood = -944.16, aic = 1908.31


> fit5=arima(elgas,order=c(2,1,1),seasonal=list(order=c(1,1,1), period=12))
> fit5

Coefficients:
ar1 ar2 ma1 sar1 sma1
0.4905 -0.0831 -0.7977 -0.0801 -1.0000
s.e. 0.0746 0.0543 0.0631 0.0452 0.0295

sigma^2 estimated as 2.243: log likelihood = -962.67, aic = 1937.33

> fit6=arima(elgas,order=c(2,1,2),seasonal=list(order=c(2,0,1), period=12))


> fit6

Coefficients:
ar1 ar2 ma1 ma2 sar1 sar2 sma1
-0.3155 0.0362 0.0514 -0.3492 0.1228 -0.2337 -0.2210
s.e. 0.1335 0.1142 0.1274 0.1221 0.1242 0.0455 0.1264

sigma^2 estimated as 2.124: log likelihood = -947.18, aic = 1910.37

> fit7=arima(elgas,order=c(3,1,2),seasonal=list(order=c(2,0,1), period=12))


> fit7

35
Coefficients:
ar1 ar2 ar3 ma1 ma2 sar1 sar2 sma1
-0.1616 0.0584 0.0543 -0.1029 -0.3274 0.1225 -0.2298 -0.2201
s.e. 0.2935 0.1189 0.0813 0.2907 0.1443 0.1331 0.0469 0.1349

sigma^2 estimated as 2.122: log likelihood = -946.98, aic = 1911.97

i) After deciding on the best model, we checked the diagnostics of the model. Please interpret
each output. If assumptions are not satisfied, suggest a method to solve the problem.

36
> shapiro.test(window(rstandard(fit),start=c(1972,1)))

W = 0.97475, p-value = 6.703e-08

> jarque.bera.test(rstandard(fit))

X-squared = 133.32, df = 2, p-value < 2.2e-16

> Box.test(rstandard(fit),lag=15,type = c("Ljung-Box"))

X-squared = 19.382, df = 15, p-value = 0.1969

> Box.test(rstandard(fit),lag=15,type = c("Box-Pierce"))

X-squared = 18.933, df = 15, p-value = 0.2168

Squared residuals are obtained and ACF and PACF plots are obtained as follows. Why are
looking at the squared residuals? What these graphs tell us?
rr=rstandard(fit)^2

37
> ArchTest(rstandard(fit), lags=12, demean = FALSE)

ARCH LM-test
Chi-squared = 20.554, df = 12, p-value = 0.05731

j) We obtained the following forecasts and prediction limits. What can you say about these
forecasts and prediction limits? Are they plausible?

38
k) We also applied exponential smoothing methods to the series. To be able to compare the
results of exponential smoothing methods and ARMA method, we obtained the accuracy
measures of the original series and predicted ones, and accuracy measures of the last 11
values of the original series and forecasts. Please interpret the below outputs and comment on
which method is better to describe the actual pattern of the series.
> accuracy(fitted(fit),elgas)
ME RMSE MAE MPE MAPE ACF1 Theil's U
Test set 0.2644343 1.446644 1.078247 0.353284 1.369643 -0.03846985 0.9245192

>original=ts(c(101.4898,100.4082,96.3485,102.2796,101.8051,104.8063,105.1380,1
08.1116,105.1481,102.2434,97.6937),start=2016,frequency=12)

>accuracy(forecast$pred,original)
ME RMSE MAE MPE MAPE ACF1 Theil's U
Test set 2.304757 3.657833 3.22604 2.168638 3.119247 0.2974644 1.077506

> fitets=ets(elgas)
> fitets
ETS(M,A,N)

Smoothing parameters:
alpha = 0.6732
beta = 1e-04

Initial states:
l = 45.2253
b = 0.1257

sigma: 0.0188

AIC AICc BIC


3693.019 3693.096 3710.096

> forets=forecast(fitets, h=11)

39
> accuracy(forets,original)
ME RMSE MAE MPE MAPE MASE
Training set -0.03572081 1.551127 1.160001 -0.05735221 1.457766 0.475965
Test set 2.68807709 4.168534 3.680907 2.53036388 3.554188 1.510329
ACF1 Theil's U
Training set 0.04905469 NA
Test set 0.38412884 1.261246

4. Consider two-dimensional vector AR(1) processes

Y t=

X t=

a) Is the series I(1) or I(0)? Show.


b) Is Yt and/or Xt cointegrated? If so, find the cointegrating vector for the series.
c) Write the processes in error correction representation form if they are cointegrated.
5. Quarterly, seasonally adjusted real U.S. money (M1), GNP in 1982 Dollars, discount rate
on 91-day treasury bills (rd), yield on long term treasury bonds (rb) 1954Q1-1987Q4 are
considered to see how these variables are affecting each other.

40
a) What features of the data are evident in the time series plots and related statistics that
should be captured by a model? Is there other evidence that would be useful in
formulating a modeling strategy?

41
b) Interpret the below output. Explain the next step.
> pp.test(usmoney[,1])

Dickey-Fuller Z(alpha) = -2.583, Truncation lag parameter = 4, p-value


= 0.9512

> pp.test(usmoney[,2])
Dickey-Fuller Z(alpha) = -11.685, Truncation lag parameter = 4, p-value
= 0.442

> pp.test(usmoney[,3])

Dickey-Fuller Z(alpha) = -13.987, Truncation lag parameter = 4, p-value


= 0.3093

> pp.test(usmoney[,4])
Dickey-Fuller Z(alpha) = -11.355, Truncation lag parameter = 4, p-value
= 0.461

> pp.test(diff(usmoney[,1]))

Dickey-Fuller Z(alpha) = -53.089, Truncation lag parameter = 4, p-value


= 0.01

> pp.test(diff(usmoney[,2]))

Dickey-Fuller Z(alpha) = -100.29, Truncation lag parameter = 4, p-value


= 0.01

> pp.test(diff(usmoney[,3]))
Dickey-Fuller Z(alpha) = -100.29, Truncation lag parameter = 4, p-value
= 0.01

> pp.test(diff(usmoney[,4]))
Dickey-Fuller Z(alpha) = -107.51, Truncation lag parameter = 4, p-value
= 0.01

42
c) Suppose that we fit VAR(p) model. What will your next step of analysis according to the
following represented results? (state your comments after each bolded command line).

> VARselect(usmoney,lag.max=5,type="both")

AIC(n) HQ(n) SC(n) FPE(n)


3 2 2 3

> est=VAR(usmoney,p= ,type="none")

> normality.test(est, multivariate.only=T)


$JB

JB-Test (multivariate)

data: Residuals of VAR object est


Chi-squared = 217.78, df = 8, p-value < 2.2e-16

$Skewness

Skewness only (multivariate)

data: Residuals of VAR object est


Chi-squared = 4.548, df = 4, p-value = 0.3369

$Kurtosis

Kurtosis only (multivariate)

data: Residuals of VAR object est


Chi-squared = 213.23, df = 4, p-value < 2.2e-16

> serial.test(est,lags.pt=8,type="PT.asymptotic")

Portmanteau Test (asymptotic)

data: Residuals of VAR object est


Chi-squared = 146.6, df = 96, p-value = 0.0006833

> arch.test(est)

ARCH (multivariate)

data: Residuals of VAR object est

43
Chi-squared = 812.7, df = 500, p-value < 2.2e-16

d) What will be your conclusion on the following bivariate Granger causality test results.
> granger.test(usmoney,2)
F-statistic p-value
gnp -> M1 2.6367332 7.545159e-02
rd -> M1 38.7379382 6.661338e-14
rb -> M1 43.9400152 2.775558e-15
M1 -> gnp 8.4407533 3.589392e-04
rd -> gnp 13.7400155 3.894357e-06
rb -> gnp 2.4516549 9.015642e-02
M1 -> rd 3.3934523 3.661867e-02
gnp -> rd 2.5298032 8.362174e-02
rb -> rd 1.3222158 2.701317e-01
M1 -> rb 0.3351329 7.158639e-01
gnp -> rb 1.9896483 1.409179e-01
rd -> rb 1.6315008 1.996463e-01

e) Explain the purpose of the following R commands (bold lines). Comment on results. What is
the cointegration rank of the model? What are the cointegration vectors of the model?

> model=ca.jo(usmoney,ecdet="none",K=2)

> summary(model)

######################
# Johansen-Procedure #
######################

Values of teststatistic and critical values of test:

test 10pct 5pct 1pct


r <= 3 | 1.07 6.50 8.18 11.65
r <= 2 | 2.98 12.91 14.90 19.19
r <= 1 | 15.23 18.90 21.07 25.75
r = 0 | 44.47 24.78 27.14 32.14

> cajorls(model,r=1)

$rlm

Call:

44
lm(formula = substitute(form1), data = data.mat)

Coefficients:
M1.d gnp.d rd.d rb.d
ect1 4.778e-03 5.141e-02 -4.347e-06 -6.699e-06
constant -1.898e+00 -1.302e+01 1.434e-03 3.667e-03
M1.dl1 4.032e-01 5.318e-01 4.386e-04 1.238e-04
gnp.dl1 1.057e-02 3.408e-02 4.031e-05 2.805e-05
rd.dl1 -2.483e+02 -3.091e+02 4.270e-02 5.522e-02
rb.dl1 -4.094e+02 1.494e+03 2.884e-01 6.086e-02

$beta
ect1
M1.l2 1.000000e+00
gnp.l2 -9.728465e-02
rd.l2 -2.067023e+04
rb.l2 2.308650e+04

45

You might also like