Unit Root Testing in AR1 Processes
Unit Root Testing in AR1 Processes
BACHELOR OF SCIENCE
in
TECHNISCHE WISKUNDE
door
Delft, Nederland
Januari 2015
Begeleider
Overige commissieleden
The purpose of this study is to investigate the asymptotics of a first order auto regressive
unit root process, AR(1). The goal is to determine which tests could be used to test for
the presence of a unit root in a first order auto regressive process. A unit root is present
when the root of the characteristic equation of this process equals unity. In order to test
for the presence of a unit root, we developed an understanding of the characteristics of
the AR(1) process, such that the difference between a trend stationary process and a unit
root process is clear.
The first test that will be examined is the Dickey-Fuller test. The estimator of this
test is based on Ordinary Least Square Regression and a t-test statistic, which is why we
have computed an ordinary least square estimator and the test statistic to test for the
presence of unit root in the first order auto regressive process. Furthermore we examined
the consistency of this estimator and its asymptotic properties. The limiting distribution
of the test statistic is known as the Dickey-Fuller distribution. With a Monte Carlo
approach, we implemented the Dickey-Fuller test statistic in Matlab and computed the
(asymptotic) power of this test. Under the assumption of Gaussian innovations (or shocks)
the limiting distribution of the unit root process is the same as without the normality
assumption been made. When there is a reason to assume Gaussianity of the innovations,
the Likelihood Ratio test can be used to test for a unit root.
The asymptotic power envelope is obtained with help of the Likelihood Ratio test, since
the Neyman-Pearson lemma states that the Likelihood Ratio test is the point optimal
test for simple hypotheses. By calculating the likelihood functions the test statistic was
obtained, such that an explicit formula for the power envelope was found. Since each fixed
alternative results in a different critical value and thus in a different unit root test, there
is no uniform most powerful test available. Instead we are interested in asymptotically
point optimal tests and we will analyze which of these point optimal tests is the overall
best performing test. By comparing the asymptotic powercurve to the asymptotic power
envelope for each fixed alternative we could draw a conclusion on which fixed alternative
results in the overall best performing test.
On the basis of the results of this research, it can be concluded that there does not
exist a uniform most powerful test, nonetheless we can define an overall best performing
test.
3
Contents
Contents 4
List of Tables 6
List of Figures 7
1 Introduction 8
4
6.4 The Asymptotic Power Envelope . . . . . . . . . . . . . . . . . . . . . . . 38
6.5 Analytic Solution to the Ornstein-Uhlenbeck Process . . . . . . . . . . . . 40
6.6 Asymptotically Point Optimal Unit Root Test . . . . . . . . . . . . . . . . 42
7 Summary of Results 44
8 Discussion 45
Bibliography 47
Appendices 49
A Auxiliary results 50
A.1 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
A.2 Functional Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . 50
A.3 Continuous Mapping Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 50
A.4 Neyman-Pearson Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5
List of Tables
4.1 Critical values kTα of tT for several significance levels α and sample sizes T . 27
5.1 The power of the Dickey-Fuller test for finite sample sizes T at significance
level α = 0.05. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 The power of the Dickey-Fuller test for N (0, 1) innovations at nominal
significance level α = 0.05. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 The power of the Dickey-Fuller test for N (0, σ 2 = 22 ) innovations at nom-
inal significance level α = 0.05. . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4 The power of the Dickey-Fuller test for several innovations and large sample
size T = 500 with nominal significance level α = 0.05. . . . . . . . . . . . . 33
6
List of Figures
2.1 A stationary AR(1) process (φ = 0.5) and a unit root process (φ = 1). . . . 13
2.2 Trend stationary process compared to a unit root process. . . . . . . . . . 15
5.1 The power of the Dickey-Fuller test at significance level α = 0.05 for T =
25, 50, 100, 250. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Power of the Dickey-Fuller test for local alternatives φ1 = 1 + cT1 close to
unity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.1 The power of the test close to unity corresponds to the asymptotic size
α = 0.05. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2 The asymptotic power envelope. . . . . . . . . . . . . . . . . . . . . . . . . 42
7
Chapter 1
Introduction
The statistical analysis of a stationary time series is more straightforward and more stud-
ied than the analysis of a non-stationary time series. That is why in order to perform
regression analysis on the data, the raw data is often transformed to a stationary pro-
cess. Roughly speaking, a stationary process is a process whose statistical properties do
not change over time, that is the mean and the variance are constant. For many real
applications the assumption of stationarity is not valid and therefore stationarity of the
data needs to be tested. One possible reason for a time series to be non-stationary is the
presence of a unit root. The following example illustrates the importance of testing if a
unit root is present.
Figure 1.1: The unemployment rate of the Netherlands from 2006 to 2014.
8
Hysteresis Hypothesis (HH). The NRH states that the unemployment rate fluctuates
around a certain rate, the natural rate. The unemployment rate can temporarily deviate
from the natural rate due to for example an exogenous shock, but in the long run it will
revert to its natural rate. The NRH theory is therefore consistent with the absence of
a unit root in the unemployment rate time series. Opposed to the NRH, the Hysteresis
Hypothesis states that there does not exist an equilibrium level of the unemployment rate,
with the consequence that a shock has a permanent effect on the unemployment rate time
series. From a statistical point of view this implies that if the HH theory holds the time
series contains a unit root.
Just as in the rest of the European Union, the recession starting in 2008 had a huge
negative effect on the Dutch economy. One way to mitigate the economic recession in
a country is to keep the inflation rate low. The positive effect of a low inflation rate is
the opportunity for the labor market to adjust quickly to negative changes. Monetary
authorities such as De Nederlandsche Bank (The Dutch Bank ) have the power to keep the
inflation rate stable and low. In particular, a contractionary monetary policy has the aim
to reduce the growth of the money supply such that the rate of inflation will stop growing
or will even shrink. However, a contractionary monetary policy does not only have the
positive effect on the inflation rate. The undesirable effect of this policy is that is has the
tendency to increase the unemployment rate of a country. Because of policy implications
the decision whether or not the unemployment rate time series has a unit root is very
important for a monetary authority. If the NRH theory holds, this means that the raise
of the unemployment rate, due to a contractionary monetary policy, does not have a
permanent effect and the time series will eventually revert to its natural rate. However,
if the HH theory is correct, the adverse effect of the monetary policy is permanent, which
will lead to a permanently higher unemployment rate in the Netherlands. Therefore we
can conclude that whether or not the unemployment rate has a unit root is a key element
in designing an optimal policy by The Dutch Bank. 4
Not only the unemployment rate time series could be non-stationary, in fact many
time series such as exchange rates, inflation rates, real outputs etc. should be tested
for stationarity. The nature of non-stationarity could differ: seasonality of the data, the
presence of a deterministic trend, the presence of a stochastic trend and structural breaks
are examples of non-stationarity. Seasonality, structural breaks and the presence of a
deterministic trend in time series are very interesting, yet complicated on their own and
the present thesis will not discuss them, since this thesis focuses on the unit root problem.
Since testing for a unit root is important, this thesis will present two possible unit root
tests: The Dickey-Fuller test and the Likelihood Ratio test.
If a time series is non-stationary, the standard statistical analysis is not valid. The
most common methods of statistical analysis rely on the Law of Large Numbers and the
Central Limit Theorem. But to be able to work with these two theorems the assumption
that the time series is stationary is required, such that the standard methods to perform
statistical analysis on a non-stationary time series will not be correct. This thesis is a
study on the asymptotic properties of inference procedures designed for non-stationary
time series. In large sample theory (asymptotic theory) the properties of an estimator
and test statistic are examined when the sample size becomes indefinitely large. The idea
is that the properties of an estimator when the sample size becomes arbitrarily large, are
9
comparable to the properties of the estimator if the sample is finite.
The unit root problem is a well-studied topic in econometrics. For example Elliot,
Stock and Rothenberg (1992) [4] did a study on the efficiency of tests for auto regressive
unit root processes, where earlier Dickey and Fuller (1979) [1] studied the distribution
of the estimators for auto regressive time series with a unit root. Of course, there are
results for more complex time series than the first order auto regressive process, but they
are beyond this thesis. With this thesis, we aim to examine a non standard problem
in a simple case and show how non-stationarity of a time series can influence statistical
analysis.
First we will give a brief introduction on the first order auto regressive process and the
unit root process and will explain the difference between a trend stationary process and
a unit root process. The basics of unit root testing, such as the hypotheses of interest,
ordinary least square regression and the asymptotic properties of the unit root process
will be dealt with in Chapter 3. The Dickey-Fuller test will be examined in Chapter 4
and Chapter 5. The Likelihood Ratio test will be dealt with in Chapter 6 as well as
the computation of the asymptotic power envelope. The last chapter will summarize the
results of the asymptotic properties of the Dickey-Fuller test and the Likelihood Ratio
test.
10
Chapter 2
11
Substituting for Yt−2 yields
The factor φ has a strong effect on the behavior of the AR(1) process. We distinguish
three cases:
• |φ| < 1;
• |φ| > 1;
• φ = 1.
Let us consider the first case when −1 < φ < 1, the observation at time t, Yt , in (2.1.4)
will be negligible when t is very large, such that the effect of shocks in the past have no
significance influence on the behavior of the time series. The weight given to a shock
which occured far in to the past will be extremely small. Therefore the time series has a
long term mean and is stationary. On the other hand, if |φ| > 1 the observation at time
t, Yt , will be large and the weight given to a shock a long time ago will be greater than
the weight given to recent shocks. In the long term this process will explode. Clearly this
process is non-stationary. At last let us consider the case in which φ = 1. In this case
the process is non-stationary and behaves as a random walk process. We will discuss this
process in detail later on. Mathematically we can show (weak) stationarity of a process
by computing the mean and the variance in the case |φ| < 1. If |φ| < 1 we can rewrite
the model in the following way:
Yt = φYt−1 + t
(2.1.5)
Yt = t + φt−1 + φ2 t−2 + · · · .
By taking expectations we obtain
Var(Yt ) = Var(φYt−1 + t )
= Var(φYt−1 ) + Var(t ) (2.1.7)
= φ2 Var(Yt−1 ) + σ 2 .
12
Under the stationarity assumption Var(Yt ) = Var(Yt−1 ). After substituting Var(Yt−1 )
with Var(Yt ) we obtain
Since Var(Yt ) > 0, it follows that 1 − φ2 > 0 and we see that the stationarity assumption
is satisfied for |φ| < 1. Therefore the process is stationary if |φ| < 1. We conclude that
the mean and the variance of an AR(1) process with |φ| < 1 are constant and thus {Yt }
is stationary (see Definition 2.1.2).
5
φ=0.5
φ=1
3
Observation
-1
0 10 20 30 40 50 60 70 80 90 100
Time
Figure 2.1: A stationary AR(1) process (φ = 0.5) and a unit root process (φ = 1).
Figure 2.1 indicates the difference between a stationary AR(1) process and a unit
root process, since these time series are simulated with the same innovations {t }. In the
stationary case, the shocks do not have a permanent effect, while in the unit root process,
the shocks do have a permanent effect on the behavior of the process. A process can be
stationary in two ways: strong/strict stationary or weak stationary. A stochastic process
is weak stationary or strict stationary when it meets the following definitions:
Definition 2.1.2. A stochastic process Yt is weak stationary when it meets the following
properties:
13
• E[Yt ] = µ, ∀t > 0 ∈ N;
Definition 2.1.3. A stochastic process Yt is strict (or strong) stationary when its joint
distribution (Yt1 , Yt2 , . . . , Ytk ) function equals (Yt1+h , Yt2+h , . . . , Ytk+h ).
By Definition 2.1.2 of weak stationarity we conclude that the AR(1) process with
|φ| < 1 is a weak stationary process.
LYt ≡ Yt−1 .
In the case of the AR(1) process, the lag polynomial notation results in the following
equation:
1 − φz = 0. (2.1.10)
Thus if the AR(1) process has a unit root, i.e. z = 1 is a root of the characteristic
equation (2.1.10), φ must equal 1, i.e. φ = 1.
1 − φz = 0 ⇔ z = φ = 1
If φ = 1 the process is non-stationary. This is easy to verify by computing the variance,
since in the case of φ = 1
t
X
Yt = i . (2.1.11)
i=0
14
2.2 Trend Stationary Process VS Unit Root Process.
In economics, many time series are not stationary. In general we distinguish between two
cases the Trend Stationary Process and the Unit Root Process:
• Unit Root Process: a non-stationary process with a stochastic trend or a unit root.
5
Trend Stationary Process
Unit Root process
4
2
Observation
-1
-2
-3
-4
0 10 20 30 40 50 60 70 80 90 100
Time
In a time series a unit root and a deterministic trend could both be present, in that
case the process satisfies equation (2.1.1) with φ = 1 and γ 6= 0. In this thesis we will
not analyze this special case, but in Chapter 8 we have added explanatory notes on this
process. From figure 2.2 we conclude that in the trend stationary process a positive trend
is present, while in the unit root process it seems that there is no deterministic trend
present.
Yt = m + γt + φYt−1 + t (2.2.1)
15
In the equation the additional factor γt represents the deterministic linear trend, which
is independent of the stochastic term Yt , and m represents the intercept. Let us compute
the mean and variance of this trend stationary process
The mean contains a linear trend dependent on γ and it follows that the mean is not
constant. We can conclude the trend stationary process is not stationary.
For example, consider the first order auto regressive process, AR(1)
∆Yt = Yt − Yt−1 = t .
Since t has constant variance and zero mean the process is stationary, so we write Yt ∼
I(1) and a unit root is present. In conclusion, we have seen that a unit root process
is a non-stationary process, since the variance of this process (as obtained in equation
(2.1.12)) equals Var(Yt ) = tσ 2 and increases as t becomes larger.
16
Chapter 3
The previous chapter discussed two types of non-stationary processes, a trend stationary
process and a unit root process. In order to determine whether a time series contains a
stochastic trend (a unit root) we perform a unit root test. The unit root test tests the
null hypothesis of the presence of a unit root in the AR(1) process, against the alternative
hypothesis that the process has no unit root and as a result is stationary. A unit root
test is used to test the following hypotheses:
The Dickey-Fuller test statistic is based on the Ordinary Least Square estimator. In
this chapter we will introduce the Ordinary Least Square (OLS) estimator for φ and we
will specify the distribution of this OLS regressor. Furthermore we will show that this
estimator is consistent.
Yt = φYt−1 + t .
With the method of Ordinary Least Square Regression, we are able to compute an estima-
tor for the parameter of interest, φ̂T . The method of Ordinary Least Square Regression
is based on the idea that you want to minimize the sum of the squared residuals (SSR)
with respect to φ̂T :
T
X T
X
SSR = (Yt − φ̂T Yt−1 ) = 2t ,
t=1 t=1
where T represents the sample size. The goal is to find the value φ̂T that minimizes the
SSR,
17
T
X T
X
SSR = 2t = (Yt − φ̂T Yt−1 )2 , (3.1.1)
t=1 t=1
" T
#
∂(SSR) ∂ X
= (Yt − φ̂T Yt−1 )2 , (3.1.2)
∂ φ̂T ∂ φ̂T t=1
T T
∂(SSR) X ∂ 2
X
= (Yt − φ̂T Yt−1 ) = −2φ̂T Yt−1 (Yt − φ̂T Yt−1 ). (3.1.3)
∂ φ̂T t=1 ∂ φ̂T t=1
In order to minimize (3.1.3) we set the partial derivative equal to zero and we can solve
(3.1.3) for φ̂T :
T
∂(SSR) X
=0⇔ −2φ̂T Yt−1 (Yt − φ̂T Yt−1 ) = 0 (3.1.4)
∂ φ̂T t=1
T
X T
X T
X T
X
−2φ̂T Yt−1 Yt − −2φ̂2T Yt−1
2
=0⇔ −2φ̂T Yt−1 Yt = −2φ̂2T Yt−1
2
. (3.1.5)
t=1 t=1 t=1 t=1
The Ordinary Least Square estimator φ̂T which minimizes the sum of the squared residuals
is defined by
PT
Yt−1 Yt
φ̂T = Pt=1T 2
. (3.1.9)
t=1 Yt−1
Since we calculated an estimator for φ by the method of Ordinary Least Square Regression,
it is time to focus on the asymptotic properties of this estimator. By the Central Limit
Theorem (A.1) we know that for T → ∞ if the process is stationary (φ < 1),
18
√ d
− N (0, (1 − φ2 )).
T (φ̂T − φ) → (3.1.10)
Under the null hypothesis Yt is a non-stationary process, thus we are interested in the
asymptotic distribution of φ = 1. If we work under the null hypothesis, we simply can
not use the Central Limit Theorem in the way we used it earlier. Note that for φ = 1
(3.1.10) implies
√ √
T (φ̂T − φ) = T (φ̂T − 1) → N (0, 0) = 0 (3.1.11)
and we obtain a degenerate limiting distribution. A degenerate distribution is a probabil-
ity function of a discrete random variable which consists of one single value. In our case
it resulted in a distribution centered around zero with zero variance. Our aim is to find
a non-degenerate asymptotic distribution for φ̂T under the null hypothesis and therefore
we have to make a change to the √OLS estimator (3.1.9). It turns out that we need to
multiply φ̂T by T rather than by T . To show why scaling with T is needed under H0
(i.e. φ = 1), we define the difference between φ̂T and φ as
PT
Yt−1 t
φ̂T − φ = φ̂T − 1 = Pt=1
T 2
. (3.1.12)
Y
t=1 t−1
1
If we multiply (3.1.12) with T and after substituting T = T1 we obtain
T2
PT ! T
!
1
P
Y t−1 tY Y
t=1 t−1 t Y
T (φ̂T − 1) = T Pt=1
T 2
− 1 = T1 P T 2
−T . (3.1.13)
t=1 Y t−1 T 2 t=1 Y t−1
If we replace 1 with
PT 2
Yt−1
1 = PTt=1 2
(3.1.14)
t=1 Yt−1
and substitute in (3.1.13), we obtain
PT PT !!
1 2
T t=1 Yt−1 Yt t=1 Yt−1
T (φ̂T − 1) = 1
PT 2
−T PT 2
T2
Y
t=1 t−1 t=1 Yt−1
1
PT
t=1 Yt−1 [Yt − Yt−1 ]
=T 1
PT 2
(3.1.15)
T 2 t=1 Yt−1
1
PT
T t=1 ∆Yt Yt−1
= 1
P T 2
.
T2 t=1 Yt−1
Recall that we work under the null hypothesis, such that we can substitute ∆Yt by t ,
∆t = Yt − Yt−1 = t
therefore (3.1.15) yields
19
1
PT
T t=1 t Yt−1
T (φ̂T − 1) = 1
PT 2
. (3.1.16)
T2 t=1 Yt−1
With this result we can now examine if the limiting distribution of this estimator is non-
degenerate. In the next section we will examine the asymptotic properties of the OLS
estimator and we will show that by scaling (φ̂T − 1) with T we obtain a non-degenerate
distribution of the OLS estimator.
Yt = Yt−1 + t = Y0 + 1 + 2 + . . . + t . (3.2.1)
Since we assume Y0 = 0 the process is the sum of random IID innovations. The aim
is to find the asymptotic properties of the AR(1) process with φ = 1 by means of the
asymptotic properties of a random walk. First we will introduce the Wiener Process. A
Wiener Process is a continuous time stochastic process satisfying the following definition:
Definition 3.2.1 (Standard Brownian Motion (Wiener Process)). 1
A continuous time stochastic process {W (t)}t≥0 is called a Standard Brownian Motion
(Wiener Process) when it meets the following properties:
• W (0) = 0;
• For any dates 0 ≤ t1 < t2 < · · · < tk ≤ 1, the increments [W (t2 ) − W (t1 )], [W (t3 ) −
W (t2 )], · · · , [W (tk ) − W (tk−1 )] are independent and Gaussian for any collection of
points 0 ≤ t1 < t2 < · · · < tk−1 < tk and integer k > 2.
• W (t + s) − W (t) ∼ N (0, s) for s > 0.
The Wiener Process is also known as the Standard Brownian Motion. The Wiener
Process is highly related to a random walk. According to Donsker’s Theorem or the
Functional Central Limit Theorem (A.2) the discrete random walk approaches a Standard
Brownian Motion if the number of steps in the random walk increases (t → ∞) and the
step size becomes smaller. As a result, the Wiener Process is the scaling limit of a random
walk. The following proposition holds for a random walk:
Proposition 3.2.1 (Convergence of a random walk). Suppose ψt a random walk,
ψt = ψt−1 + ut
where ut is IID with zero mean and constant variance σ 2 , such that the following properties
hold:
1
Time Series Analysis- James D. Hamilton [8].
20
1
PT d R1
1. T t=1 − σ2
ut ψt−1 → 0
W (t)dW (t) = 12 σ 2 (W (1)2 − 1)
1
PT 2 d R1
2. T2 t=1 ψt−1 − σ2
→ 0
W (t)2 dt
d
Where {W (t)} defines a Wiener Process and →
− defines convergence in distribution.
As we have shown in (3.1.16) the deviation of the OLS estimator from the actual value
φ satisfies
1
PT
T t=1 t Yt−1
T (φ̂T − 1) = 1 P T 2
. (3.2.2)
T2 t=1 Yt−1
Under the null hypothesis Yt describes a random walk, with IID innovations t with
zero mean and constant variance σ 2 . By Proposition 3.2.1 and the Continuous Mapping
Theorem A.3 we may conclude that the asymptotic distribution of (3.2.2) is defined as:
1
PT 1 2
T t=1 t Yt−1 d 2 (W (1) − 1)
T (φ̂T − 1) = 1 P T 2
→
− R 1 . (3.2.3)
Y W (t)2 dt
T 2 t=1 t−1 0
Consider the AR(1) process (2.1.1) under the null hypothesis satisfying Assumption 1.
t
X t
X
Yt = φs t−s = t−s = t + t−1 t−2 + · · · + 1 . (3.3.1)
s=1 s=1
When we make the extra assumption of t ∼ N (0, σ 2 ), the process {Yt }t≥0 is the sum of
Gaussian random variables. Therefore (3.3.1) implies that Yt is Gaussian with zero mean
and variance tσ 2 :
Yt ∼ N (0, tσ 2 ). (3.3.2)
Since Yt represents a random walk, we can write the squared random walk process Yt2 in
the following way:
Yt2 − Yt−1
2
− 2t
Yt2 = (Yt−1 + t )2 = Yt−1
2
+ 2Yt−1 t + 2t ⇔ Yt−1 t = . (3.3.3)
2
We are interested in the sum of all the squared observations of the process {Yt }t≥0 , so if
we sum (3.3.3) from 1 to T we obtain
21
T T
X X Yt2 − Yt−1
2
− 2t
Yt−1 t =
t=1 t=1
2
PT 2
PT 2
PT
t=1 Yt t=1 Yt−1 t
= − − t=1
2 2 2
2 2 2 2
Y + Y2 + . . . + YT −1 + YT Y 2 + Y12 + . . . + YT2−1 1 + 2 + . . . + 2T −1 + 2T
= 1 − 0 −
2 2 2
2 2
P T
Y − Y0 t
= T − t=1 .
2 2
(3.3.4)
T T 2 T
1 X 1 2 1 X 2 1 YT 1 X 2
Y
t−1 t = Y − = √ − . (3.3.6)
σ 2 T t=1 2σ 2 T T 2σ 2 T t=1 t 2 σ T 2σ 2 T t=1 t
2
Since Yt ∼ N (0, σ 2 t) it implies that σY√TT ∼ N (0, 1). Then by definition σY√TT follows a
Chi-Squared distribution, 2
YT
√ ∼ X 2 (1).
σ T
PT 2
Let us have a look at the term t=1 t . This is a sum of squared IID normal random
variables with zero mean and constant variance σ 2 , such that by the Law of Large Num-
bers 2 :
T
1X 2 p 2
→σ . (3.3.7)
T t=1 t
Combining the previous results, we have shown that
T
1 X d 1
2
Yt−1 t → Y −1 (3.3.8)
σ T t=1 2
where Y ∼ X 2 (1).
As a result we have found the limiting distribution of the numerator of equation (3.2.3)
for Gaussian innovations. By the definition of a Wiener Process it follows that W (1)2 =
X 2 (1), which implies that
2
J. Doob, Stochastic Processes, John Wiley & Sons, 1953 [3]
22
1 1
W (1)2 − 1 = X 2 (1) − 1 ,
(3.3.9)
2 2
such that the limiting distribution of the numerator of equation (3.2.3) for Gaussian and
non Gaussian innovations are indeed the same. By a similar argument as in section 3.2
we can conclude from the Continuous Mapping Theorem (A.3) and Proposition 3.2.1 that
the limiting distribution of T (φ̂T − 1) satisfies:
1
PT 1 2
T t=1 t Yt−1 d 2 (X (1) − 1)
T (φ̂T − 1) = 1 P T 2
→
− R 1 . (3.3.10)
2 dt
T2 t=1 Yt−1 0
W (t)
Hence we can conclude that the OLS estimator φ̂T is a super consistent estimator for the
real value φ. With the asymptotic properties of this estimator we are able to examine the
asymptotic distribution of a test statistic in order to test for the presence of a unit root.
The test was defined by David Dickey and Wayne Fuller in 1979. In the next chapter we
will investigate the Dickey-Fuller test and explain how we can use it to test for a unit
root. For the special case in which the innovations are Gaussian, t ∼ N (0, σ 2 ), we will
construct the Likelihood Ratio test and approximate the asymptotic power of this test.
23
Chapter 4
In 1979 Wayne Fuller and David Dickey developed a test to examine whether there is
a unit root present in a first order auto regressive process {Yt }t≥0 . This test is named
after the two statisticians and is known as the Dickey-Fuller test. The Dickey-Fuller test
studies the presence of a unit root in the first order auto regressive process (2.1.1), even
if Assumption 1 is not valid. The consideration to include intercept and/or trend results
in three possible auto regressive processes:
• Testing for a unit root.
Yt = φYt−1 + t
• Testing for a unit root with drift and deterministic time trend.
Yt = m + φYt−1 + γt + t
Each model results in a different test statistic and different critical values for the
Dickey-Fuller test. That is why it is important for practical reasons to select the correct
underlying auto regressive process before performing a unit root test. Clearly, the first
two processes are simplifications of the third more general auto regressive process. The
AR(1) process under Assumption 1 corresponds to the first model and we will perform
a Monte Carlo simulation to obtain the (asymptotic) critical values for the Dickey-Fuller
test statistic introduced in the next section.
Yt = φYt−1 + t (4.1.1)
where t ∼ W N (0, σ 2 ). In this section we will examine the OLS estimator and we will
introduce the test statistic that we use for testing whether there is a unit root present.
In Chapter 3 we found the OLS estimator φ̂T to be
24
PT
Yt−1 Yt
φ̂T = Pt=1
T 2
.
t=1 Yt−1
Let us define the standard t-statistic tT
φ̂T − 1
tT = (4.1.2)
σ̂φ̂T
where σ̂φ̂T is the usual OLS standard error of the estimator,
s
s2
σ̂φ̂T = PT T 2
t=1 Yt−1
The distributions (3.2.3) and (4.1.4) are known as Dickey-Fuller distributions, since David
Dickey and Wayne Fuller developed the asymptotics of this unit root test.
25
4.2 Critical Values of the Dickey-Fuller Distribution
With the Monte Carlo approach we simulated a distribution for the t-statistic (4.1.2) and
are able calculate the critical values for the test statistic tT . Monte Carlo simulation is
based on the idea of repeated random sampling in order to approximate the underlying
distribution. Under the null hypothesis of a unit root, we repeatedly (N = 50, 000 times)
sampled a first order auto regressive process of length T to approximate the distribution
of tT . Figure 4.1 and Figure 4.2 show the outline of the distribution of the t-statistic for
the sample sizes T = 25, 50, 100, 250.
5000 4500
4500 4000
4000
3500
3500
3000
3000
2500
2500
#
#
2000
2000
1500
1500
1000
1000
500 500
0 0
-6 -4 -2 0 2 4 6 -5 -4 -3 -2 -1 0 1 2 3 4 5
tT tT
(a) T = 25 (b) T = 50
Figure 4.1: The approximation of the distributions of tT for sample sizes T = 25 and
T = 50.
4500 4000
4000 3500
3500
3000
3000
2500
2500
2000
#
2000
1500
1500
1000
1000
500 500
0 0
-6 -4 -2 0 2 4 6 -5 -4 -3 -2 -1 0 1 2 3 4 5
tT tT
Figure 4.2: The approximation of the distributions of tT for sample sizes T = 100 and
T = 250.
As we can see in both Figure 4.1 and Figure 4.2 the distribution of the t-statistic tT is
positively skewed. The finite sample critical values are obtained by sorting the results of
the finite sample Monte Carlo simulation and determining the critical value at significance
level α. Table 4.1 shows the critical values kTα of the test statistic tT for several different
sample sizes and significance levels α. The asymptotic distribution of the test statistic tT
is known as the Dickey-Fuller distribution (4.1.4). Instead of simulating the data from
the AR(1) process, we obtain the asymptotic critical values of t∞ by sampling a Wiener
Process and approximate the asymptotic distribution of the test statistic t∞ . Figure 4.3
26
shows an approximation of the distribution of t∞ , which has been used to approximate
the asymptotic critical values of the Dickey-Fuller distribution listed in Table 4.1.
2000
1800
1600
1400
1200
1000
#
800
600
400
200
0
-5 -4 -3 -2 -1 0 1 2 3 4
t∞
Table 4.1: Critical values kTα of tT for several significance levels α and sample sizes T .
By acquiring the critical values of the Dickey-Fuller test, we are able to examine the
power of the Dickey-Fuller test at a certain significance level α. The next chapter examines
the power of the Dickey-Fuller test for several values of φ and different sample sizes T
of the AR(1) process (2.1.1) under Assumption 1. These powers will be evaluated at
significance level α = 0.05, such that we will only consider the critical values listed in
the last column of Table 4.1. We will not only look into the power of the finite sample
process, but also examine the asymptotic power when the parameter φ is close to unity.
Obviously, if we examine the finite sample power we will use the finite sample critical
values and if we examine the asymptotic power we will be using the asymptotic critical
α
value k∞ = −1.9312.
27
Chapter 5
Power analysis provides information on how good the test is to detect an effect. An effect
is the difference between the value of the parameter under the null hypothesis (φ0 = 1)
and the actual true value (φ1 ). Since we will discuss the performance of the Dickey-Fuller
test it is useful to check the power of the test under different circumstances. We will
analyze the power of the test for several sample sizes and effect sizes. The power of a
statistical test is the probability that the null hypothesis is rejected at a fixed significance
level α when in fact the null hypothesis is false. Which is equivalent to correctly accepting
the alternative hypothesis. Thus
Power = Pr(H0 is rejected |H0 is false) = Pr(H0 is rejected |φ = φ1 < 1). (5.0.1)
The statistical power is dependent on the sample size. The power is often used to calculate
the sample size needed to detect an effect of a given size α. To enlarge the power of a
statistical test, one possible option is to increase the sample size. By means of the finite
sample critical values listed in Table 4.1 we can obtain the power of the Dickey-Fuller
test. With Monte Carlo simulation we approximated the power of the Dickey-Fuller test
at significance level α = 0.05. We calculated the power of the test for AR(1) processes
under Assumption 1, for different alternative hypotheses H1 : φ = φ1 and several sample
sizes T = 25, T = 50, T = 100 and T = 250. For the sake of clarity we define φ1 as
the parameter of the AR(1) process we simulated from and φ = 1 as the value of the
parameter under the null hypothesis.
28
Since the power of the test is dependent on the difference between the real parameter
φ1 and φ = 1, we compute for a sequence of alternatives φ1 ’s the power of the Dickey-
Fuller test. In Table 5.1 we see that for φ1 = 0.5 the power of the test is very close to 1,
which is very high. But if we consider φ1 = 0.9 we conclude that the power of the test is
low. The probability of a type II error β, satisfying Power = 1 − β, is very high, in other
words the probability of failing to reject the null hypothesis is large.
φ1 T = 25 T = 50 T = 100 T = 250
0.5 0.907 1.000 1.000 1.000
0.6 0.774 0.996 1.000 1.000
0.7 0.566 0.977 1.000 1.000
0.8 0.335 0.7606 0.999 1.000
0.9 0.153 0.322 0.752 1.000
1 0.050 0.057 0.045 0.051
Table 5.1: The power of the Dickey-Fuller test for finite sample sizes T at significance
level α = 0.05.
As shown in table 5.1 if the distance |φ1 − φ| gets smaller, the power decreases. To
illustrate this decreasing power figure 5.1 shows the power of the Dickey-Fuller test at
significance level α = 0.05 for several sample sizes T .
1
T=25
T=50
0.9 T=100
T=250
0.8
0.7
0.6
Power
0.5
0.4
0.3
0.2
0.1
0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
φ
Figure 5.1: The power of the Dickey-Fuller test at significance level α = 0.05 for T =
25, 50, 100, 250.
Figure 5.1 illustrates the importance of unit root testing. When the value φ1 ap-
proaches unity, the Dickey-Fuller test does not perform the way we would like it to per-
form; the power of the test for values of φ1 close to unity is low. However, if the distance
29
|φ1 − 1| is large, there is no need to test for non-stationarity since the time series will show
stationary properties. Therefore the area of interest is the area of φ1 close to unity and
we will examine the properties of the Dickey-Fuller test for a sequence of φ1 ’s close to 1.
Since we have examined the asymptotic properties of the Dickey-Fuller test, we assume
that we can add data such that the sample size grows indefinitely. To shrink the area
of interest, a small neighborhood of unity, we introduce the local alternative framework.
The local alternative framework is used to shrink the neighborhood of unity as the sample
size T grows indefinitely.
30
0.8
0.7
0.6
0.5
Power
0.4
0.3
0.2
0.1
0
0.99 0.992 0.994 0.996 0.998 1 1.002
φ
Figure 5.2: Power of the Dickey-Fuller test for local alternatives φ1 = 1 + cT1 close to unity.
5.2.1 t ∼ N (0, 1)
In this section, we will examine the power of the Dickey-Fuller test when the innovations
are standard normally distributed, t ∼ N (0, 1). Monte Carlo Simulation is used to per-
form the Dickey-Fuller test. To be able to compare the power of the Dickey-Fuller test
simulated with Gaussian innovations with the power of the Dickey-Fuller test simulated
with White Noise innovations, we will sample an AR(1) process for several local alterna-
tives φ1 = 1 + cT1 close to unity at nominal significance level α = 0.05 for different sample
sizes T .
Table 5.2: The power of the Dickey-Fuller test for N (0, 1) innovations at nominal signifi-
cance level α = 0.05.
31
5.2.2 t ∼ N (0, σ 2 )
Since the innovations could also be normally distributed with mean zero and a known
constant variance σ 2 6= 1, it is interesting to examine the power of the Dickey-Fuller test
of this special case. By Monte Carlo simulation we have obtained Table 5.3 which consists
of the power of the Dickey-Fuller test for several sample sizes T and local alternatives
c1 < 0 ∈ Z such that φ1 = 1 + cT1 . For simulation purposes, we performed Monte Carlo
simulation with known variance σ 2 = 4.
Table 5.3: The power of the Dickey-Fuller test for N (0, σ 2 = 22 ) innovations at nominal
significance level α = 0.05.
By Monte Carlo simulation we have obtained the power for different types of innova-
tions and we can compare these values in order to see which type of innovations produces
the highest power of the Dickey-Fuller test. Table 5.4 illustrates the difference between
the power of the three types of innovations for several fixed local alternatives c1 < 0 ∈ Z.
From Table 5.4 we can conclude that the times series simulated with standard normal
innovations leads to the least biased power estimation. We have fixed the significance
level at α = 0.05 which means that we allow the probability of falsely rejecting the null
hypothesis to be 5% (probability on a type I error). As we can see in the first line in
Table 5.4 the power of the Dickey-Fuller test for the three types of innovations do not
correspond to the significance level α = 0.05. Therefore we can conclude that the results
are biased. Since the probability on a type I error for the AR(1) process simulated with
standard normal innovations is the closest to 0.05, we conclude that the power of the
Dickey-Fuller test for this type of innovations is the least biased.
If the data gives a reason to assume the innovations to be Gaussian, it gives the
opportunity to implement a different unit root test. If indeed the innovations are IID
and Gaussian, we are able to implement the Likelihood Ratio test for unit roots. In the
next chapter we will look into the Likelihood Ratio test and we will examine the power
of this unit root test. Furthermore with the Likelihood Ratio test we can compute the
asymptotic power envelope for unit root tests.
32
−c1 W N (0, σ 2 ) N (0, 1) N (0, σ 2 )
0 0.042 0.0540 0.063
2 0.106 0.136 0.128
4 0.230 0.274 0.244
6 0.399 0.415 0.419
8 0.579 0.625 0.580
10 0.761 0.761 0.750
Table 5.4: The power of the Dickey-Fuller test for several innovations and large sample
size T = 500 with nominal significance level α = 0.05.
33
Chapter 6
Section 5.2 approximates the asymptotic power of the Dickey-Fuller test under the as-
sumption that the innovations are Gaussian. The Monte Carlo simulation of the AR(1)
process is done with Gaussian innovations rather than White Noise innovations. In order
to perform further power analysis on the AR(1) process with Gaussian innovations, we can
implement the Likelihood Ratio test. This test can also be implemented for other types
of innovations, but in that case the computation of the Likelihood Ratio would become
rather difficult. Instead we stick to Gaussian innovations which are rather common in ap-
plications. By means of the Likelihood Ratio test, we can compute the asymptotic power
envelope for unit root tests. The asymptotic power envelope is a great tool to compare
unit root tests to the maximum asymptotically attainable power. Therefore, within this
chapter the Likelihood Ratio test will be defined and it will be explained how to compute
the power envelope as well as the asymptotic power envelope for unit root tests.
The Likelihood Ratio test is based on the likelihood of two models, the first model
defines the model under the null hypothesis and the second model defines the model under
the alternative hypothesis. Both models will be fitted to the time series’ data and the
likelihood functions will be calculated in order to determine which model is more likely to
be true (hence the name Likelihood Ratio test). If the model under the null hypothesis is
more likely to be true, it results in a large test statistic, denoted as ΛT (·) (dependent on
the sample size T ) and the null hypothesis is rejected for small values of the test statistic
ΛT (·). In the unit root problem the null hypothesis and the simple alternative hypothesis
are the following:
34
6.1 Computing the Likelihood Functions
First we calculate the likelihood functions: L(φ = φ0 |Y1 , . . . , YT ) & L(φ = φ1 |Y1 , . . . , YT ).
The trick is to write the process {Yt }t≥0 in terms of the innovations t = Yt − φYt−1 . Since
we assumed the innovations independent identically distributed and Gaussian with zero
mean and known constant variance σ 2 , we can easily compute the likelihood functions.
Let us compute the likelihood functions of the hypotheses of interest:
T
2t
Y 1
f (1 , . . . , T |φ = 1) = √ exp − 2
t=1 2πσ 2 2σ
PT 2 !
=(2πσ 2 )−T /2 exp − t=12 t (6.1.1)
2σ
PT !
2
(Y t − Y t−1 )
=(2πσ 2 )−T /2 exp − t=1 ,
2σ 2
T
2t
Y 1
f (1 , . . . , T |φ = φ1 ) = √ exp − 2
t=1 2πσ 2 2σ
PT 2 !
=(2πσ 2 )−T /2 exp − t=12 t (6.1.2)
2σ
PT !
2
(Y t − φ Y
1 t−1 )
=(2πσ 2 )−T /2 exp − t=1 2
.
2σ
With these likelihood functions we will obtain the Likelihood Ratio test statistic ΛT (Yt ) (6.0.1):
35
c1
By substituting φ1 = 1 + T
in (6.1.3) we obtain
T !
1 X c1 c1 2 2
ΛT (t ) = exp − 2 (2 1 + − 2)Yt Yt−1 + (1 − 1 + )Yt−1
2σ t=1 T T
T 2
!
1 X 2c1 2c1 c
= exp − 2 Yt Yt−1 − − 12 Yt−1 2
2σ t=1 T T T
T ! (6.1.4)
c21 2
1 X 2c1
= exp − 2 Yt−1 (Yt − Yt−1 ) − 2 Yt−1
2σ t=1 T T
T !
1 X 2c1 c21 2
= exp − 2 Yt−1 ∆Yt − 2 Yt−1 .
2σ t=1 T T
The resulting Likelihood Ratio function statistic is defined by
T !
1 X c1 1 c1 2 2
ΛT (t ) = exp − 2 Yt−1 ∆Yt − Yt−1 . (6.1.5)
σ t=1 T 2 T
The log Likelihood Ratio function is defined as the natural logarithm of the Likelihood
Ratio:
T
1 X c1 1 c1 2 2
log [ΛT (t )] = − 2 Yt−1 ∆Yt − Yt−1
σ t=1 T 2 T
T T
(6.1.6)
c1 X c2 X 2
=− 2 Yt−1 ∆Yt + Y .
σ T t=1 2T 2 σ 2 t=1 t−1
We can simplify (6.1.6) by substituting:
T
1 X
AT = Yt−1 ∆Yt ,
T σ 2 t=2
T
1 X 2
BT = 2 2 Y ,
T σ t=2 t−1
and we obtain the test statistic of the Likelihood Ratio test:
1
log [ΛT (t )] = −c1 AT + c21 BT . (6.1.7)
2
As a result we have obtained the test statistic for the log Likelihood Ratio test. The
test rejects the null hypothesis in favor of the process being stationary, for small values
of (6.1.7). Since the alternative hypothesis is dependent on the fixed alternative c1 such
that each local alternative corresponds to its own critical value lTα (c1 ) at significance level
α. Let us define the set of M ∈ N negative local alternatives
36
6.2 Asymptotic Critical Values of the Likelihood Ra-
tio Test
The previous section resulted in the find of the log likelihood ratio test statistic. Since
each critical value lTα (ci ) corresponds to a fixed local alternative ci ∈ C being tested,
we have to determine the critical values corresponding to the M fixed alternatives. The
likelihood ratio test rejects the null hypothesis in favor of the fixed alternative ci for small
values of the test statistic log [ΛT (Yt )], the critical values satisfy
37
ci
for several (local-to-unity) fixed alternatives ci ∈ C such that φi = 1 + T
. The power
envelope is defined by:
where ρT is a unit root test from the class, Y defines the time series of length T and φi the
fixed alternative. With help of the Neyman-Pearson lemma (explained in Appendix A.4),
there is a simple way to derive the power envelope. The Neyman-Pearson lemma states
that the Likelihood Ratio test is the most powerful test for two simple hypotheses H0 :
φ = 1 and H1 : φ = φi < 1. In section 6.1 we have calculated the Likelihood Ratio test
statistic and by the lemma we conclude that the point optimal unit root test rejects the
null hypothesis for small values of
1
log [ΛT (t )] = −T (φi − 1)AT + [T (φi − 1)]2 BT . (6.3.2)
2
This results in an explicit formula for the power envelope ΠαT (φi )
with AT and BT defined in (6.1.7), lTα (φi ) the critical value corresponding to the fixed
alternative φi = 1 + Tc at significance level α = 0.05, which satisfies
1 2 α
Prφ=1 −T (φi − 1)AT + [T (φi − 1)] BT ≤ lT (φi ) = α
2
For every fixed alternative φi we can compute the corresponding lTα (φi ) and calculate the
maximum power ΠαT (φi ), such that the power envelope is pointwise obtainable. The result
is a sequence of most powerful test depending on the alternative being considered for test
size α = 0.05. The optimal test against the alternative φ = φi < 1 is dependent on the
fixed alternative φi and as as result there does not exist a uniformly most powerful test
at significance level α. As we shall see later on this also holds for the asymptotic power
envelope.
38
0.45
Asymptotic Power envelope
Fit
0.4
0.35
0.3
Power
0.25
0.2
0.15
0.1
0.05
0.999 0.9991 0.9992 0.9993 0.9994 0.9995 0.9996 0.9997 0.9998 0.9999 1
φ
Figure 6.1: The power of the test close to unity corresponds to the asymptotic size
α = 0.05.
Figure 6.1 shows the asymptotic power envelope for local alternatives extremely close
to unity. The figure shows that the size is just achieved asymptotically, such that we can
no longer speak of the exact test size, but of the asymptotic size of the test. The power of
the test for local alternatives close to unity approaches the value 0.05 which corresponds
with the test size α = 0.051 . If we assume that the limit (6.4.1) exists, there is an explicit
formula for the asymptotic power envelope:
α 1 2 α
Π∞ (ci ) = lim Prci −ci AT + ci BT ≤ l∞ (ci ) . (6.4.2)
T →∞ 2
Elliot et al. have proven that the asymptotic power envelope equals:
1 2
Πα∞ (ci ) α
= lim Prci −ci AT + ci BT ≤ l∞ (ci )
T →∞ 2
Z 1 (6.4.3)
1 2 1
Z
α
=Pr −ci Wci (r)dW (r) − ci Wci (r)dr ≤ l∞ (ci ) .
0 2 0
39
1
ΠαT (φi ) 2 α
=Prφi −T (φi − 1)AT + [T (φi − 1)] BT ≤ lT (φi )
2
1 2 α
=Prci −ci AT + ci BT ≤ lT (ci )
2
(6.4.4)
2 1 2 α
=Prci −ci AT + ci BT − ci BT ≤ lT (ci )
2
1 2 α
=Prci −ci (AT − ci BT ) − ci BT ≤ lT (ci ) .
2
Elliot et al. provided us with the asymptotic distribution of the expressions for AT and
BT :
d R1
• AT − ci BT →− 0 Wci (t)dW (t);
d R1
• BT →
− 0
Wci (t)2 dt.
1 2
ΠαT (ci ) α
=Prci −ci (AT − ci BT ) − ci BT ≤ lT (ci )
2
L
−
→ (6.4.5)
Z 1 Z 1
α 1
Π∞ (ci ) =Prci −ci Wci (t)dW (t) − c2i Wci (t)2 dt ≤ l∞
α
(ci ) .
0 2 0
Where W (t) indicates a Wiener Process and Wci (t) denotes an Ornstein-Uhlenbeck Pro-
cess which satisfies the following stochastic differential equation:
40
dWci (t) = k(µ − Wci (t))dt + σdW (t). (6.5.1)
In (6.5.1) Wci (·) defines an Ornstein-Uhlenbeck process, W (·) a Wiener process, µ is
the long term mean of the Ornstein-Uhlenbeck process, σ = 1 and ci ∈ C is the fixed
alternative after substituting it by k = −ci . µ is the long term mean, which means that
over time the process will tend to drift towards this value µ. If µ 6= 0 the data is not
centered, therefore we will substitute Y (t) = Wci (t) − µ. The process Yt satisfies the
stochastic differential equation
Z t
kt −ks
e Y (t) =e Y (s) + σeku dW (u)
s
⇔ (6.5.6)
Z t
Y (t) =e−k(t−s) Y (s) + σe−k(t−u) dW (u).
s
As a result, after back substitution of Y (t) = Wci (t) − µ and k = −ci we have found the
analytic solution to the Ornstein-Uhlenbeck stochastic differential eqaution:
Z t
ci (t−s)
Wci (t) = µ + e (Wci (t) − µ) + σeci (t−u) dW (u). (6.5.7)
s
Note that (6.4.6) is a simplified version of the general Ornstein-Uhlenbeck stochastic
differential equation (6.5.1) with µ = 0 and σ = 1, by means of this solution the simulation
of an Ornstein-Uhlenbeck process in Matlab is straightforward. With the solution to
this process we can calculate the asymptotic power envelope. Figure 6.2 illustrates the
asymptotic power envelope for fixed local alternatives ci ∈ C. The graph shows for each
alternative ci the maximum asymptotically attainable power of the class of unit root tests.
3
By the product rule for Itô integrals [14].
41
1
0.9
0.8
0.7
0.6
Power
0.5
0.4
0.3
0.2
0.1
0
0.96 0.965 0.97 0.975 0.98 0.985 0.99 0.995 1 1.005
φ
Z 1 Z 1
1
Πα∞ (ci ) = Prci −ci Wci (t)dW (t) − c2i 2
Wci (t) dt ≤ α
l∞ (ci ) . (6.6.1)
0 2 0
The asymptotic power envelope is the maximum attainable asymptotic power of unit
root tests against the fixed alternative ci . Another way to define this maximum at-
tainable power is in terms of the powercurve Πα∞ (c, ci ), which is the power of the unit
42
α
root test corresponding to the critical value l∞ (ci ) subject to the set of alternatives
C = {c1 , c2 , . . . , cM |ci ∈ Z<0 & ci > ci+1 ∀i ≤ M }:
Z 1
1 2 1
Z
α 2 α
Π∞ (c, ci ) = Pr −ci Wc (t)dW (t) − ci Wc (t) dt ≤ l∞ (ci ) (6.6.2)
0 2 0
where ci denotes the alternative of which the powercurve is computed and c defines the
actual value of the AR(1) process. If we maximize this powercurve (6.6.2) Πα∞ (ci ) over
the range of local alternatives in the small neighborhood around unity, we will obtain the
asymptotic power envelope for the fixed alternative ci :
Z 1
1 2 1
Z
α 2 α
maxc Π∞ (c, ci ) =maxc Pr −ci Wc (t)dW (t) − ci Wc (t) dt ≤ l∞ (ci )
0 2 0
Z 1
1 2 1
Z
2 α (6.6.3)
=Pr −ci Wci (t)dW (t) − ci Wci (t) dt ≤ l∞ (ci
0 2 0
α
=Π∞ (ci ).
To select the overall best performing unit root test, we will compare each powercurve to
the asymptotic power envelope and the method of least squares will result in the best
overall performing test. The method of least squares minimizes the following function:
M
X
minci (Πα∞ (ci ) − Πα∞ (c, ci ))2
i=1
This method returns the fixed alternative ci such that the test obtained with this fixed
alternative is the best performing one. As a result we have found a unit root test with
the best overall performance.
43
Chapter 7
Summary of Results
This thesis has focused the unit root process. In the first order auto regressive process
Yt = φYt−1 + t , a unit root is present if φ = 1 and we investigated the ordinary least
square estimator to construct a standard t-test statistic. The test we examined is known
as the Dickey-Fuller test which rejects the null hypothesis for small values of tT = φ̂−1 σ̂φ̂
and we concluded that the limiting distribution corresponds to the Dickey-Fuller distribu-
tion (4.1.4). The distributions of the innovations is important for the type of unit root test
being considered. If we assumed t ∼ W N (0, σ 2 ) the Dickey-Fuller test is an appropriate
unit root test, instead if we assumed t ∼ N (0, σ 2 ) we could test the null hypothesis with
the Likelihood Ratio test, which has the benefit of performing further power analysis on
the unit root process.
The Likelihood Ratio test rejects the null hypothesis of the presence of a unit root for
small values of
1
log [ΛT (t )] = −c1 AT + c21 BT
2
and with help of the Neyman-Pearson lemma we obtained that this test provided for each
fixed alternative the point optimal test. The Likelihood Ratio gave the opportunity to
compute the asymptotic power envelope. The analysis on the asymptotic power envelope
done by Elliot, Stock and Rothenberg did prove that there exists an explicit formula for
the asymptotic power envelope:
Z 1
1 2 1
Z
α 2 α
Π∞ (ci ) = Prci −ci Wci (t)dW (t) − ci Wci (t) dt ≤ l∞ (ci ) .
0 2 0
44
Chapter 8
Discussion
This thesis discussed the Dickey-Fuller test, the Likelihood Ratio test and asymptotic
theory of unit root processes and power envelopes for the AR(1) process without intercept
and deterministic trend. AR(1) processes are not always suitable in empirical studies on
time series and, often, higher order auto regressive processes, AR(p) with p > 1 ∈ Z, with
intercept and/or a deterministic trend are considered. Therefore we recommend to do
further research on the asymptotic distribution of the AR(p) process when Assumption 1
is not valid. We will give an outline of the extra simulation and transformations that
need to be done in the case of several other auto regressive processes. Since the unit root
problem is widely studied in econometrics, there are quite some papers which shed light
on the less simplified auto regressive process with a unit root. In the following paragraphs
we will give an insight on the way one could obtain these different limiting distributions.
The first order auto regressive process under Assumption 1, is a simplified version
of the general first order auto regressive process. In Assumption 1 there is no intercept
present and no deterministic trend. The Dickey Fuller test we examined is only valid
when we work with the AR(1) process (2.1.1) under Assumption 1. If we add an intercept
m or a deterministic trend γt to the process’ equation it will result in a different test
statistic and asymptotic distribution compared to the one we examined in this thesis. If
in the AR(1) process an intercept m 6= 0 and deterministic trend are present it yields the
following auto regressive process:
Yt = m + φYt−1 + γt + t (8.0.1)
where t represents IID White Noise with zero mean and constant variance. David Dickey
and Wayne Fuller computed alternative ways to obtain the limiting distribution of 8.0.1.
If one is interested in computing the test statistic and the power of the Dickey-Fuller test
when an intercept m 6= 0 is present, it is possible to center the data by examining Yt − m
opposed to Yt . In this case under the null hypothesis the increments are equal to the
innovations and the process Yt − m behaves as a random walk.
Most often it is not evident if a time series contains a deterministic trend, thus one
has to examine the presence of a trend before choosing the appropriate Dickey-Fuller test.
Harvey, Leybourne and Taylor (2009) [9] have written a paper in which they discuss the
uncertainty about the trend and initial value Y0 . They concluded that it is possible to
detrend the data in order to perform a Dickey-Fuller type unit root test. As a result they
could argue that Dickey-Fuller type unit root tests are almost asymptotically efficient
45
when a deterministic trend is present in the data. We can conclude that there are ways
to use the Dickey-Fuller unit root test for data consisting a deterministic trend and we
would recommend to consider several detrending methods in order to find the test with
locally optimal power.
In reality the first order auto regressive process is not often being considered to fit
a time series. Often higher orders p ∈ N of regression are fitted to a time series. Since
the characteristic equation of that series has several roots, testing for the presence of
a unit root becomes more difficult. Instead of using the standard Dickey-Fuller test,
the Augmented Dickey-Fuller test was introduced by Wayne Fuller [6]. An important
implication of testing for a unit root in higher order auto regressive processes is the
proper selection of the lag p.
Another point of criticism we should mention is the fact that the least square estimator
is biased. By definition of the critical values, the probability of a type I error must be
smaller then the significance level α = 0.05. Table 5.4 shows us that these results do not
match the significance level α, thus we can conclude that the results are biased. There
are several methods available to reduce bias in a Monte Carlo simulation, but as this was
not the aim of the thesis, we did not implement and test these bias reduction methods.
To perform a more accurate power analysis of the Dickey-Fuller test, we recommend to
perform a bias reduction method.
46
Bibliography
[1] David A. Dickey and Wayne A. Fuller. Distribution of the estimators for autore-
gressive time series with a unit root. Journal of the American statistical association,
74(366a):427–431, 1979.
[2] David A Dickey and Wayne A Fuller. Likelihood ratio statistics for autoregressive
time series with a unit root. Econometrica: Journal of the Econometric Society,
pages 1057–1072, 1981.
[3] Joseph L. Doob. Stochastic processes, volume 101. New York Wiley, 1953.
[4] Graham Elliott, Thomas J. Rothenberg, and James H. Stock. Efficient tests for an
autoregressive unit root, 1992.
[5] Graham Elliott, Thomas J. Rothenberg, and James H. Stock. Efficient tests for an
autoregressive unit root, 1996.
[6] Wayne A. Fuller. Introduction to statistical time series, volume 428. John Wiley &
Sons, 2009.
[7] Niels Haldrup and Michael Jansson. Improving size and power in unit root testing.
Palgrave handbook of econometrics, 1:252–277, 2006.
[8] James D. Hamilton. Time series analysis, volume 2. Princeton university press
Princeton, 1994.
[9] David I. Harvey, Stephen J. Leybourne, and Robert Taylor. Unit root testing in
practice: dealing with uncertainty over the trend and initial condition. Econometric
Theory, 25(03):587–636, 2009.
[10] Robin High. Important factors in designing statistical power analysis studies. Com-
puting News, Summer,(14-15), 2000.
[11] Jerzy Neyman and Egon S. Pearson. On the Problem of the Most Efficient Tests of
Statistical Hypotheses. Royal Society of London Philosophical Transactions Series
A, 231:289–337, 1933.
[12] Peter C.B. Phillips and Pierre Perron. Testing for a unit root in time series regression.
Biometrika, 75(2):335–346, 1988.
[13] G. William Schwert. Tests for unit roots: A monte carlo investigation. Journal of
Business & Economic Statistics, 20(1):5–17, 2002.
47
[14] Steven E. Shreve. Stochastic calculus for finance II: Continuous-time models, vol-
ume 11. Springer, 2004.
[15] G. E. Uhlenbeck and L. S. Ornstein. On the theory of the brownian motion. Phys.
Rev., 36:823–841, Sep 1930.
48
Appendices
49
Appendix A
Auxiliary results
1
Without loss of generality we can take the variance σ 2 = 1
50
Theorem A.3.1 (Continuous Mapping Theorem). The continuous mapping theorem
states that if ST (·) → S(·) and g(·) is a continuous functional, with S(·) a continuous-
time stochastic process and with S(r) representing its value at some time r ∈ [0, 1], then
g(ST (·)) → g(S(·)).
51
Appendix B
clear all;
%randn('seed',1234);
hatsigma=zeros(1,N);
hatphi=zeros(1,N);
for k=1:N
X=zeros(1,T);
Z1=zeros(1,T);
Z2=zeros(1,T);
Y1=zeros(1,T);
52
for j = 2:T
X(j)= phi*X(j-1)+rand-0.5;
Z1(j)=X(j-1)*X(j);
Z2(j)=X(j-1)ˆ2;
end
Z1=cumsum(Z1);
Z2=cumsum(Z2);
hatphi(k)=Z1(T)/Z2(T);
for j=2:T
Y1(j)=(X(j)-hatphi(k)*X(j-1))ˆ2;
end
Y1 = cumsum(Y1);
end
dfcritical=sort(ttest);
plot(dfcritical);
nbins=100;
hist(dfcritical,nbins)
% Critical values
alpha=[alpha1,alpha2,alpha3]
clear all;
%randn('seed',1234);
53
num=zeros(1,N2); %numerator of the DF distr.
den=zeros(1,N2); %denominator of the DF distr.
df=zeros(1,N2);
for k=1:N2
dW = zeros(1,N);
W = zeros(1,N);
Z=zeros(1,N);
Y= zeros(1,N);
dW(1) = sqrt(dt)*randn;
W(1) = dW(1);
for j = 2:N
dW(j) = sqrt(dt)*randn; % Wiener process simulation
W(j) = W(j-1)+dW(j);
Z(j) = (W(j-1)ˆ2);
Y(j) = W(j-1) * dW(j);
end
Z = cumsum(Z);
Y = cumsum(Y);
den(k)=sqrt(dt*Z(N)); %denominator of the DF distr
num(k)= Y(N); %numerator of the DF distr
end
dfcritical=sort(df);
plot(dfcritical);
nbins=100;
hist(dfcritical,nbins)
% Critical values
alpha=[alpha1,alpha2,alpha3]
54
% Dickey Fuller test
% White Noise errors (rand-0.5)
N=1000;
c=[-10:1:0]; %fixed alternatives
%phi1=0.5:0.05:1;
M=length(c);
type2=zeros(1,M);
power=zeros(1,M);
for i=1:M
phi=1+c(i)/T;
Y=zeros(T,1);
hatphi=zeros(1,N);
teller=zeros(T,N);
noemer=zeros(T,N);
teller2=zeros(T,N);
hatsigma=zeros(1,N);
testvalue=zeros(1,N);
criticalvalue=-1.941;
count=0;
for k=1:N
Y(1)=0;
for t=2:T
Y(t)=phi*Y(t-1)+randn;
% constructie hatphi
teller(t,k)=Y(t-1)*Y(t);
noemer(t,k)=Y(t-1)ˆ2;
end
% constructie hatphi
teller=cumsum(teller);
noemer=cumsum(noemer);
hatphi(k)=teller(T,k)/noemer(T,k);
55
% constructie hatsigma
for t=2:T
teller2(t,k)=(Y(t)-hatphi(k)*Y(t-1))ˆ2;
end
teller2=cumsum(teller2);
hatsigma(k)=sqrt(((teller2(T,k))/(T-1))/noemer(T,k));
testvalue(k)=(hatphi(k)-1)/hatsigma(k);
if(testvalue(k)> criticalvalue)
count=count+1;
end
end
%nbins=100;
%hist(testvalue,nbins);
type2(i)=count/N;
power(i)=1-type2(i);
end
plot(1+c/T, power)
M=length(c);
type2=zeros(1,M);
power5=zeros(1,M);
power6=zeros(1,M);
power4=zeros(1,M);
for i=1:M
56
% Generate one unit root process of length T
% Calculate type II error
phi=1+(c(i)/T);
Y=zeros(T,1);
hatphi=zeros(1,N);
teller=zeros(T,N);
noemer=zeros(T,N);
teller2=zeros(T,N);
hatsigma=zeros(1,N);
testvalue=zeros(1,N);
criticalvalue=-1.9522;
count=0;
for k=1:N
Y(1)=0;
for t=2:T
Y(t)=phi*Y(t-1)+randn;
% constructie hatphi
teller(t,k)=Y(t-1)*Y(t);
noemer(t,k)=Y(t-1)ˆ2;
end
% constructie hatphi
teller=cumsum(teller);
noemer=cumsum(noemer);
hatphi(k)=teller(T,k)/noemer(T,k);
% constructie hatsigma
for t=2:T
teller2(t,k)=(Y(t)-hatphi(k)*Y(t-1))ˆ2;
end
teller2=cumsum(teller2);
hatsigma(k)=sqrt(((teller2(T,k))/(T-1))/noemer(T,k));
testvalue(k)=(hatphi(k)-1)/hatsigma(k);
if(testvalue(k)> criticalvalue)
count=count+1;
end
end
type2(i)=count/N;
power5(i)=1-type2(i);
57
end
plot(1+c/T,power5)
hold on;
for i=1:M
phi=1+(c(i)/T);
Y=zeros(T,1);
hatphi=zeros(1,N);
teller=zeros(T,N);
noemer=zeros(T,N);
teller2=zeros(T,N);
hatsigma=zeros(1,N);
testvalue=zeros(1,N);
criticalvalue=-1.9645;
count=0;
for k=1:N
Y(1)=0;
for t=2:T
Y(t)=phi*Y(t-1)+2*randn;
% constructie hatphi
teller(t,k)=Y(t-1)*Y(t);
noemer(t,k)=Y(t-1)ˆ2;
end
% constructie hatphi
teller=cumsum(teller);
noemer=cumsum(noemer);
hatphi(k)=teller(T,k)/noemer(T,k);
% constructie hatsigma
for t=2:T
teller2(t,k)=(Y(t)-hatphi(k)*Y(t-1))ˆ2;
end
teller2=cumsum(teller2);
hatsigma(k)=sqrt(((teller2(T,k))/(T-1))/noemer(T,k));
testvalue(k)=(hatphi(k)-1)/hatsigma(k);
58
if(testvalue(k)> criticalvalue)
count=count+1;
end
end
type2(i)=count/N;
power6(i)=1-type2(i);
end
for i=1:M
phi=1+c(i)/T;
Y=zeros(T,1);
hatphi=zeros(1,N);
teller=zeros(T,N);
noemer=zeros(T,N);
teller2=zeros(T,N);
hatsigma=zeros(1,N);
testvalue=zeros(1,N);
criticalvalue=-1.9682;
count=0;
for k=1:N
Y(1)=0;
for t=2:T
Y(t)=phi*Y(t-1)+2*randn;
% constructie hatphi
teller(t,k)=Y(t-1)*Y(t);
noemer(t,k)=Y(t-1)ˆ2;
end
% constructie hatphi
teller=cumsum(teller);
noemer=cumsum(noemer);
hatphi(k)=teller(T,k)/noemer(T,k);
% constructie hatsigma
for t=2:T
teller2(t,k)=(Y(t)-hatphi(k)*Y(t-1))ˆ2;
59
end
teller2=cumsum(teller2);
hatsigma(k)=sqrt(((teller2(T,k))/(T-1))/noemer(T,k));
testvalue(k)=(hatphi(k)-1)/hatsigma(k);
if(testvalue(k)> criticalvalue)
count=count+1;
end
end
type2(i)=count/N;
power4(i)=1-type2(i);
end
% critical values for the likelihood ratio test are dependent on the fixed
% alternative which is being used. Therefore we construct a vector cv with
% a critical value for each fixed alternative.
clear all
for i=1:M
for k=1:N2
60
dW = zeros(1,N);
W = zeros(1,N);
Z=zeros(1,N);
Y= zeros(1,N);
X=zeros(1,N);
dW(1) = sqrt(dt)*randn;
W(1) = dW(1);
dW(j) = sqrt(dt)*randn;
W(j) = W(j-1)+dW(j);
Z(j) = (W(j-1)ˆ2);
Y(j) = W(j-1) * dW(j);
end
Z = cumsum(Z);
Y = cumsum(Y);
A(k)=dt*Z(N);
B(k)= Y(N);
lr(k)=-c(i)*B(k)+0.5*c(i)ˆ2*A(k);
end
end
randn('seed',1234);
clearvars -except cv; %delete all variables except cv
%%%%%%%%%% Declaration of variables %%%%%%%%%%%%%%
61
for i=1:M
for k=1:N
Y(1)=0;
for t=2:T
Y(t)=(1+(falt/T))*Y(t-1)+randn; %AR(1) process simulation
A(t,k)=Y(t-1)*(Y(t)-Y(t-1));
B(t,k)=Y(t-1)ˆ2;
end
if(lrtest(k)< cv(i))
count=count+1;
end
end
end
plot(-c,powerlik);
hold on;
62
N=1000; %number of intervals of the integral
N2=1000; %reps
T=1;
dt=T/N;
c=-10:0.5:0; %fixed alternative
critval=cv;
powerenv=zeros(1,length(cv));
lrtest=zeros(1,N2);
for k=1:length(c);
count=0;
for i=1:N2
dW(1) = sqrt(dt)*randn;
W(1) = dW(1);
for j=2:N
Z = cumsum(Z);
Y = dt*cumsum(Y);
lrtest(i)=-c(k)*Z(N)-0.5*c(k)ˆ2*Y(N);
if lrtest(i)<cv(k)
count=count+1;
end
end
powerenv(k)=count/N2;
end
plot(-c,powerenv)
63