Material de Trabajo Topics in Time Series Econometrics 3
Material de Trabajo Topics in Time Series Econometrics 3
Contents
1 Complex Numbers 5
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Polar form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Exponential Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Modulus (Absolute value) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Unit circle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Convergence of Series of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Complex Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.8 The Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.8.1 The Derivative function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.8.2 The derivative at a point, z0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.9 Complex-differentiable function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.9.1 What if the function is not continuous? . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.10 Regular Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.11 Convergence of a Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.11.1 Ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.11.2 Root test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.12 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.13 Complex Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.13.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.13.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.13.3 Pseudo-variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.13.4 Covariance and Complementary Covariance . . . . . . . . . . . . . . . . . . . . . . . . 22
1.13.5 Uncorrelatedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.13.6 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Complex functions 22
2.1 The natural logarithm of a complex number . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.1 Principal Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.2 When is the function continuous? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.3 Complex differentiability of natural logarithm . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 The natural exponential of a complex number . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.1 Complex differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 The complex valued function: (1 − αz)−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1 This content cannot be shared, uploaded, or distributed. This content has not been written for publication.
2 The contact email address is [email protected]
3 The contact email address is [email protected]
2.3.1 Complex differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 Taylor Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.3 Convergence of the Taylor Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Lag polynomial 36
6 Spectral Representation 54
6.1 Frequency Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.2 Frequency Domain versus Time Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.3 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4 Fourier Transform - Some Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4.1 Winding the original series around a circle . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4.2 The winding frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.4.3 The center of mass of the winding graph . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.4.4 The center of mass as a function of the winding frequency . . . . . . . . . . . . . . . . 56
6.4.5 Winding the original series around a circle on the Complex Plane . . . . . . . . . . . . 57
6.5 Fourier Transform - Formal Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.6 Fourier Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.6.1 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.7 Alternative Conventions for formal definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
1
6.7.1 Putting the the factor of 2π in the Fourier transform instead of in its inverse . . . . . 66
1
6.7.2 Splitting the factor of 2π evenly between the Fourier transform and its inverse . . . . 66
6.8 Lag Operator Calculus and Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.8.1 Case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.8.2 Case 2: AR(m) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.8.3 Case 3: MA(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2
6.9 Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.9.1 Some Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.9.2 Formal Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.9.3 Approximation of the spectral density by the second moment of the Fourier transform 70
6.9.4 Finding the autocovariance: Fourier Inverse . . . . . . . . . . . . . . . . . . . . . . . . 71
6.9.5 Lag Operator Calculus, Stationarity and the Spectral density . . . . . . . . . . . . . . 73
7 AR(P) 77
7.1 The lag polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.2 Solving the difference equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.3 Is the inverse of the lag polynomial well defined? . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.3.1 The roots of the characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.3.2 An alternative characteristic polynomial: The reflected polynomial . . . . . . . . . . . 78
7.4 Is the function a regular one? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.4.1 Inverting π(L) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.4.2 Is the characteristic polynomial π(z)−1 analytic on the unit circle, |z| = 1? . . . . . . 82
7.5 Is the process stationary? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.6 The Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.7 Impulse response Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9 VAR (p) 90
9.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
9.2 Companion Form Representation of an AR(p) Model . . . . . . . . . . . . . . . . . . . . . . . 91
9.2.1 Stationarity of the Stacked VAR(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.3 The VAR(1) model with 2 variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9.3.1 Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9.3.2 Assumptions on the errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
9.3.3 Solving the system of equations: Inverting the matrix . . . . . . . . . . . . . . . . . . 93
9.3.4 The roots of the characteristic polynomial and the eigenvalues . . . . . . . . . . . . . 96
9.3.5 Stationary Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
9.4 VAR(1) with n variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
9.5 The VAR(2) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
9.5.1 Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
9.5.2 Solving the system of equations: Inverting the matrix . . . . . . . . . . . . . . . . . . 97
9.5.3 Stationary Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.6 The VAR(p) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.6.1 Stationary Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3
10.5 The Identification Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
10.6 Reduced form to structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
10.6.1 A note about B0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
10.6.2 Identification of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
10.7 Identification by Short Run Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
10.8 Identification by Long Run Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
10.9 Identification from Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Appendices 127
4
1 Complex Numbers
1.1 Introduction
z = x + iy (1)
√
• i= −1
• Re(z) = x: denotes the real parts of z
• lm(z) = y: denotes the imaginary parts of z
• The absolute value, or modulus of a complex number is denoted by:
p
|z| = x2 + y 2
Notice that the modulus of a complex number is always a real number and in fact it will never be
negative.
• The complex conjugate of a complex number is the number with an equal real part and an imaginary
part equal in magnitude, but opposite in sign.
z̄ = x − iy
An important identity:
z z̄ = |z|2
Thus, the product of a complex number and its conjugate is a real number.
• The modulus of a complex conjugate of a complex number is the same as the modulus of the
original complex number:
|z| = |z̄|
p p
x2 + y 2 = x2 + (−y)2
5
Figure 1: Complex plane
• φ is also an element of θ, which is the argument of a complex number and it is defined as the
angle inclined from the real axis in the direction of the complex number represented on the complex
plane
θ = arg z
Figure 2: arg(z)
• It can be seen that the argument of any non-zero complex number has many possible values: firstly, as
a geometrical angle, it is clear that whole circle rotations do not change the point, so angles differing
by an integer multiple of 2π radians (a complete circle) are the same, as reflected by Figure 2
• Because a complete rotation around the origin leaves a complex number unchanged, there are many
choices which could be made for θ by circling the origin any number of times
• Thus, arg z is a multi-valued (set-valued) function in the sense that there are a set of values which
includes φ
6
• The argument of z can be any of the infinite possible values of θ each of which can be found by solving
y
tan θ =
x
Thus,
y
θ = tan−1
x
y
θ = arctan
x
The inverse tangent is a multivalued function
• Thus, whenever we see arg z, we know that the function is not unique
• We would like to see φ in a function instead of arg z
• When a well-defined function is required, then the usual choice, known as the principal value, Arg(z):
This represents an angle of up to half a complete circle from the positive real axis in either direction.
• Likewise, the argument of z in terms of the principal value is
• x is the adjacent
• iy is the opposite
• Our aim in this section is to write complex numbers in terms of a distance from the origin and a
direction (or angle) from the positive horizontal axis.
• From Pythagoras, we have:
r2 = x2 + (y)2
p
r= x2 + (y)2
7
Secant
r hypotenuse
sec φ = =
x adjacent
Cotangent
x adjacent
cot φ = =
y opposite
• Thus, we have:
y
tan φ =
x
x = r cos φ
y = r sin φ
yi = ir sin φ
z = x + yi = r(cos θ + i sin θ)
8
– Also, because any two arguments for a given complex number differ by an integer multiple of 2π,
which is the measure in radians of a complete circle, we will sometimes write the exponential form
as:
p p
|z| = [Re(z)]2 + [Im(z)]2 = x2 + y 2 (6)
|z| = r
, notice that the square root symbol represents the unique positive square root (when applied
to a positive number), which means that the modulus of a complex number is always a real
positive number
• Notice that the absolute value (or modulus) of the complex number is r in the polar form
• To get the value of r we can use (5) and the modulus of both sides and then do a little simplification
as follows:
p p
|z| = reiθ = |r| eiθ = |r|| cos θ + i sin θ| = r2 + 0 cos2 θ + sin2 θ = r
since cos2 θ + sin2 θ = 1 by the fundamental Pythagorean identity.
• When the imaginary part y is zero, this coincides with the definition of the absolute value of the real
number x.
9
1.5 Unit circle
• The unit circle is the set of complex numbers whose modulus, |z|, is 1
• On the complex plane they form a circle centered at the origin with a radius of 1
• It includes the value :
– 1 on the right extreme
– i at the top extreme
– −1 at the left extreme
– i at the bottom extreme
• On its exponential form it can be written as:
z = eiθ
We can also find its modulus. Recall from the identity with its conjugate:
|z| = (z z̄)1/2
√
eiθ = eiθ e−iθ = 1
• The interior of the unit circle is called the open unit disk
• While the interior of the unit circle combined with the unit circle itself is called the closed unit disk.
that is, if
n
X
lim zm − A = 0
n→∞
m=1
P∞
• Theorem 1 If m=1 zm converges then
lim zm = 0,
m→∞
P∞
Thus, the element zm inside m=1 zm goes to zero as long as m goes to infinity.
10
P∞
• Cauchy’s Criterion. m=1 zm converges if and only if
n
X
zm → 0
m=k
as k, n → ∞
P∞ P∞
• Absolute Convergence. m=1 zm converges absolutely if m=1 |zm | converges.
P (z) = a0 + a1 z + a2 z 2 + · · · + an z n
n
X
P (z) = ak z k
k=0
• n is a nonnegative integer, n ≥ 1,
• The ak are complex numbers not all zero
• z is a complex variable
• Any polynomial of degree n has precisely n roots
• A root is a value of z such that P (z) = 0
• The Fundamental Theorem of Algebra: For any polynomial of degree n, we can rewrite the
polynomial in terms of its roots, zi :
P (z) = an (z − z1 ) (z − z2 ) · · · (z − zn )
That is, the coefficients of P̃ (z) are the coefficients of P (z) in reverse order. Notice that:
P̃ (z) = z n P z −1
11
˜ we are finding the values of z such that:
• Indeed the roots are related. When finding the roots for P (z),
P̃ (z) = 0
z n P (z −1 ) = 0
Since the values for z cannot be zero, then we are finding the values of z such that:
P (z −1 ) = 0
Let’s denote zi as the roots of the original polynomial, P (z). Given this notation, the roots of P (z −1 )
should be zi−1 since P (z −1 ) is P (z) using the inverse variable. Therefore, we have that:
λi = zi−1 (7)
and that
P̃ (z) = a0 (z − λ1 ) (z − λ2 ) · · · (z − λn )
• Notice that:
P (z −1 ) = an z −1 − z1 z −1 − z2 · · · z −1 − zn
and that
P̃ (z −1 ) = a0 z −1 − λ1 z −1 − λ2 · · · z −1 − λn
P (z) = z n P̃ (z −1 )
Thus,
P (z) = z n a0 z −1 − λ1 z −1 − λ2 · · · z −1 − λn
P (z) = a0 z n z −1 − λ1 z −1 − λ2 · · · z −1 − λn
P (z) = a0 z −1 − λ1 z z −1 − λ2 z · · · z −1 − λn z
P (z) = a0 (1 − λ1 z) (1 − λ2 z) · · · (1 − λn z)
12
• We know that λi = zi−1 . Thus, we can also write:
1 1 1 1
P (z) = a0 1 − z 1− z 1 − z ... 1 − z (8)
z1 z2 z3 zp
f (z + ∆z) − f (z)
f ′ (z) = lim
∆z→0 ∆z
• Notice that the derivative is a limit, so for the derivative function to exist we need the limit to exist
• Knowing the set of values for which the function is not continuous is relevant since if the function is
not continuous at some values of z, then its derivative does not exist at those values.
• Recall that z = x + iy. Since i is a constant, a change on z can only be triggered by a change in x or
y. Thus:
∆z = ∆x + i∆y
f (z + ∆x) − f (z)
f ′ (z) = lim
∆x→0 ∆x
In terms of u and v:
13
• On the other hand, if we are only considering a change on the imaginary axis, y, we have that:
f (z + i∆y) − f (z)
f ′ (z) = lim
∆y→0 i∆y
, i∆y takes into account the fact that a change in y affects z in +i∆y. In terms of u and v we have
that the change in y can be isolated in the sense that we do not need to express it as +i∆y:
TRICK: Notice that iy is not an argument in u or v. We need the denominator to be ∆y so the limit
expression has the meaning of partial derivative. Thus, we proceed as follows:
Both can be used, however they can only be used if the derivative exists
• Those two expressions helps us to get the Cauchy-Riemann equations by equating the real part
and the imaginary part, respectively:
ux (x, y) = vy (x, y)
14
• Example:
f (x + yi) = (x + yi)2 = x2 − y 2 + i2xy
Notice that,
u(x, y) = x2 − y 2 and v(x, y) = 2xy
Thus,
ux = 2x = vy
uy = −2y = −vx .
Further, the derivative of f (z) is clearly f ′ (z) = 2z, so:
f (z) − f (z0 )
f ′ (z0 ) = lim
z→z0 z − z0
15
If f is complex differentiable, then the Cauchy-Riemann equations must hold:
∂u ∂v
=
∂x ∂y
and
∂v ∂u
=−
∂x ∂y
Note: For using these equations, we usually take advantage of the polar or exponential representation of z.
However it is not so easy to do.
This will play an important role when analyzing the complex natural logarithm function.
Example
• Let’s analyze the simple case y = 0, so z = x ∈ R
• Take for example the very simple function:
x+1 x≥0
f (x) =
x x<0
• It is discontinuous at x = 0
• The limit limx→0 f (x) is 0, which is not equal to the limit when x approaches to zero from the RHS
(positive numbers), 1.
• However if we apply the derivative formula like robots, we have that
f ′ (x) = 1
f (x) − f (0)
lim
x→0 x
To analyze if it exists we need to analyze if the the limit when approaching to zero from the positive
and negative numbers. Analyzing the limit when approaching to zero from the negative numbers:
So, the limit does not exist. Therefore, the derivative at x = 0 does not exists
16
1.10 Regular Function
• A function f of the complex number z is analytic at a point z0 if its derivative exists not only at z0
but also at each point z in some neighborhood of z0 .
• A regular (or holomorphic or analytic) function is defined as a complex-valued differentiable
function on an open set D of C. That is, a function is regular on a region D of C, if the complex-valued
function is complex differentiable at every point in the set D of C
• If a function f is analytic at a point, then its derivatives of all orders exist (infinitely differentiable)
and are themselves analytic there.
• The main result about a regular function is that at any point of the domain of definition, D, the
regular function can be expanded in a Taylor’s series that converges in the largest open disk that
does not contain any singularity.
• Another fundamental result is that for two regular functions the composition f (g(z)) is again
regular provided the range of g is in the domain of f
• If two functions f (z) and g(z) are analytic in a domain D, then their sum and their product are both
analytic in D.
f (z)
• The quotient g(z) is also analytic in D provided that g(z) ̸= 0 for any z in D.
• An entire function is a function that is analytic at each point in the entire complex plane
• Every polynomial is an entire function.
P (z)
• Hence the quotient Q(z) of two polynomials is analytic in any domain throughout which Q(z) ̸= 0.
∞
X
f (z) = an (z − z0 )n = a0 + a1 (z − z0 )1 + a2 (z − z0 )2 + · · ·
n=0
∞
X n
f (z) = an (z − z0 )
n=0
17
• The derivative is given by term-by-term differentiation
∞
X n−1
f ′ (z) = nan (z − z0 )
n=0
∞
X
cn
n=0
f (z) = 1 + z + z 2 + z 3 + . . . ..
∞
X zn
f (z) =
n=0
n!
18
• The limit of the absolute ratios of consecutive terms is
z n+1 /(n + 1)! |z|
L = lim = lim =0
n→∞ |z n | /n! n+1
∞
X
cn
0
1/n
If L = limn→∞ |cn | exists, then:
• If L < 1 then the series converges absolutely.
• If L > 1 then the series diverges.
• If L = 1 then the test gives no information.
Example
• Consider the geometric series
1 + z + z2 + z3 + . . . .
• Thus, the root test agrees that the geometric series converges when |z| < 1.
∞
X n
f (z) = an (z − z0 )
n=0
where the series converges on any disk |z − z0 | < r contained in G. Furthermore, we have formulas for the
coefficients:
f ′ (z0 ) f ′′ (z0 ) 2
f (z) = f (z0 ) + (z − z0 ) + (z − z0 ) + · · ·
1! 2!
That is, the Taylor series of a real or complex-valued function f (z) that is infinitely differentiable (regular)
at a real or complex number z0 .
19
In the more compact sigma notation, this can be written as
∞
X f (n) (a)
(x − a)n
n=0
n!
where f (n) (a) denotes the n th derivative of f evaluated at the point a. (The derivative of order zero of f is
defined to be f itself and (x − a)0 and 0! are both defined to be 1.)
• The special case of series when z0 = 0 is called the Maclaurin series.
• Notice that it holds with equality because it is not an approximation but an equality, but the function
must be infinity differentiable (which means that the function is regular). That is why we have an
infinite sum.
• If the function is not infinite differentiable we can have a approximation of order n
• nth Taylor polynomial: The partial sum formed by the n first terms of a Taylor series is a polynomial
of degree n. Taylor polynomials are approximations of a function, which become generally better
when n increases.
1.13.1 Expectation
• The expectation of a complex random variable is defined based on the definition of the expectation of
a real random variable:
• Note that the expectation of a complex random variable does not exist if E[x] or E[y] does not exist.
• if the complex random variable Z has a probability density function fZ (z), then the expectation is
given by
Z
E[z] = z · fz (z)dz
C
• If the complex random variable Z has a probability mass function pZ (z), then the expectation is given
by
X
E[z] = z · pz (z)
z
20
Properties
• Whenever the expectation of a complex random variable exists, taking the expectation and complex
conjugation commute:
E[z] = E[z̄]
1.13.2 Variance
• The variance is defined as
Properties
• The variance is always a nonnegative real number
• It is equal to the sum of the variances of the real and imaginary part of the complex random variable:
• The variance of a linear combination of complex random variables may be calculated using the following
formula:
" N
# N X
N
X X
Var ak Zk = ai aj Cov [Zi , Zj ]
k=1 i=1 j=1
1.13.3 Pseudo-variance
• The pseudo-variance is a special case of the pseudo-covariance and is given by
E (z − E[z])2 = E z 2 − (E[z])2
• Unlike the variance of z, which is always real and positive, the pseudo-variance of z is in general
complex.
21
1.13.4 Covariance and Complementary Covariance
• The covariance between two complex random variables z and w is defined as:
Properties
1.13.5 Uncorrelatedness
Two complex random variables z and w are called uncorrelated if their covariance is zero:
Kzw = Jzw = 0
Thus,
E[zw] = E[z]E[w]
1.13.6 Orthogonality
As with real random variables, complex quantities are said to be orthogonal if:
E[z w̄] = 0
As always, it does not imply covariance zero unless the mean of the variables are zero.
2 Complex functions
2.1 The natural logarithm of a complex number
In complex analysis, a complex logarithm of the non-zero complex number z is
w = ln z
22
• When is it undefined? If z is a real number, then we know that w is undefined for z ≤ 0
• Thus, we would be tempted to say that this function is undefined when z ≤ 0, but since z is complex,
it is not that simple.
• From the Polar form, we know that z = |z|eiθ . Thus,
ln z = ln |z|eiθ
ln z = ln |z| + ln eiθ
ln z = ln |z| + iθ ln e
ln z = ln |z| + i arg(z)
Thus, ln z is defined if
|z| > 0
– This means that we want the radius of z on the complex plane to be greater than zero
– Since |z| is always positive by definition, then we allow z < 0
– That is why we say that ln z is defined if z ∈ C\{0}, which means that the function is defined
for all complex numbers (positive and negatives) except for 0
• However, we have a problem: arg(z) is not unique, which means that ln is not the inverse of the
exponential function.
• For the function to be single-valued, we need to define the Principal Logarithm
That is, the the principal logarithm is the complex logarithm but using the principal value, Arg(z). Thus,
another way to write is
Ln z = ln |z| + i Arg(z)
• Thus, the function is unique
• Now, I have an expression for the inverse function of ez
Example 1
• Let’s find the principal logarithm for z = i
Ln i = ln |i| + i Arg(i)
23
• Recall
z = x + iy
p
|z| = x2 + y 2
Thus,
p
|z| = |0 + i(1)| = 02 + 1 2 = 1
y
Arg(i) = tan−1
x
1
Arg(i) = arctan
0
π
Arg(i) =
2
• Thus,
π
Ln i = ln 1 + i
2
π
Ln i = i
2
Example 2
• Let’s find the principal logarithm for z = 1 + i
Ln(1 + i) = ln |1 + i| + i Arg(1 + i)
• Recall
z = x + iy
p
|z| = x2 + y 2
Thus,
p √
|z| = |1 + i(1)| = 12 + 12 = 2
24
• Notice that Arg(i) can be found directly on the complex plane to be π
4. However, we can also find it
by:
y
Arg(i + 1) = tan−1
x
1
Arg(i + 1) = arctan
1
π
Arg(i + 1) =
4
• Thus,
√ π
Ln (1 + i) = ln 2 + i
4
Ln z = ln |z| + i Arg(z)
The function f : C\{0} → C given by f (z) = Ln z is continuous at all z except those along the negative real
axis.
• Let’s analyze it by its components
• ln |z| is clearly continuous for all z ∈ C\{0} since the modulus operator, |.|, is always positive and we
are already excluding the case z = 0, so the argument of ln(.) is always greater than 0.
• The question is if the second component is continuous or not.
• The continuity of Arg(z)
– Arg(z) is noncontinuous on any point on the negative real axis by its definition.
– Recall that Arg(z) is an angle such that:
−π < Arg(z) ≤ π
So, the angle can never be −π. This is exaclty what ensures that the Arg is single-valued because
if we have a y = 0 and x < 0, the angle is always π and never −π. Allowing −π will yield to have
to values for this case, π and −π
– To better understand the locations of π and −π let’s take a look at Figure 3
– Now, when analyzing the limits we should take into account the sign of the angle since it indicates
if measured counterclockwise (Arg(z) < 0) or not
25
Figure 3: Arg(z)
– Let’s go to Figure 3 and see the point on the unit circle for the blue line. Let’s suppose it is
(x0 , y0 ). We can see that by fixing x = x0 , if approaching y from above to y0 , the limit for Arg(.)
is π/4. When approaching y from below to y0 , the limit is also π/4
– From Figure 3, we can see that when x < 0, and y goes to zero from above, the limit of the angle
is π
– However, when x < 0, and y goes to zero from below, the limit of the angle is −π
• Therefore, Arg(z) is discontinuous at each point on the nonpositive real axis:
– Let z = x0 + iy for some x0 < 0 fixed
– If y approaches to 0 from above, then Arg(z) ↓ π,
– Whereas if y approaches to zero from the bottom, then Arg(z) ↑ −π
• Thus, we have that for Ln(z) is continuous and well-defined for all z ∈ C\L, in which:
• Thus, we see that we need to exclude more that 0 in order to get Ln(z) to be continuous
Ln z = ln |z| + i Arg(z)
• Recall z = x + iy, so to find the derivative, we prefer to analyze
26
• For rewriting f (z) as f (x, y), we proceed to replace |z| and Arg(z):
p y
Ln z = ln x2 + y 2 + i tan−1
x
1 y
Ln z = ln (x2 + y 2 ) + i tan−1
2 x
Then, we have that:
1
u(x, y) = ln (x2 + y 2 )
2
y
v(x, y) = tan−1
x
• Remember we have that
d(arctan(x)) 1
=
dx 1 + x2
and that for y = arctan x1 , we differentiate using the chain rule, which states that d
dx [f (g(x))] is
f ′ (g(x))g ′ (x) where f (x) = arctan(x) and g(x) = x1 .Thus,
dy 1 d 1
= 2
dx 1 + x1 dx x
Thus, we have
1 2x
ux (x, y) =
2 x2 + y 2
x
ux (x, y) =
x2 + y2
1 y
vx (x, y) = 2 − x2
1 + xy
x2 y
vx (x, y) = − 2
x2 + y 2 x
y
vx (x, y) = −
x2 + y2
27
• Therefore we have:
d(Ln z) x y
= 2 + i −
dz x + y2 x2 + y 2
d(Ln z) x − iy
= 2
dz x + y2
TRICK: Use the notable product: difference of two squares to decompose the denominator:
√
(x + iy)(x − iy) = (x2 − (iy)2 ) = (x2 − ( −1)2 y 2 ) = (x2 − (−1y 2 )) = (x2 + y 2 )
Thus, we have:
d(Ln z) x − iy
=
dz (x + iy)(x − iy)
d(Ln z) x − iy
=
dz (x + iy)(x − iy)
d(Ln z) 1
=
dz (x + iy)
d(Ln z) 1
=
dz z
• So it does not exist for z = 0. Now, we only need to check if the derivative does not exist for any other
values
• We proceed to check the Cauchy-Riemann equations:
ux (x, y) = vy (x, y)
• We have that:
1 2y
uy (x, y) =
2 x + y2
2
y
uy (x, y) =
x2 + y2
28
11
vy (x, y) = y 2 x
1+ x
x2
1
vy (x, y) =
x + y2
2 x
x
vy (x, y) =
x2 + y 2
• Thus we have that that the Cauchy-Riemann equations hold for any number of x and y:
x x
ux (x, y) = = vy (x, y) = 2
x2 +y 2 x + y2
y −y
vx (x, y) = − = −uy (x, y) = 2
x2 + y 2 x + y2
Thus, the derivative exists as long as x2 + y 2 ̸= 0, which is implied by z ̸= 0, which is also implied by
|z| > 0 that comes from the condition for the natural logarithm to be defined
• Therefore,
1. the natural logarithm of z is complex differentiable for all z ∈ C\L
2. the natural logarithm of z is analytic-regular-holomorphic for all z ∈ C\L
f (z) = ez
• When is it undefined? If z ∈ R, we know it is always defined
• Would that change now that z ∈ C?
• Recall z = x + iy, so
f (z) = ex+iy
f (z) = ex eiy
Sine and Cosine are defined over every real number and the ex is defined for any x ∈ R. Thus f (z) is
defined for any x, y ∈ R
• Thus, the natural exponential function of a complex number is always defined for any z ∈ C
29
2.2.1 Complex differentiability
For any z ∈ C, we have that the natural exponential function is a well-defined function:
f (z) = ez
• Recall z = x + iy, so to find the derivative, we prefer to analyze
f (x, y) = ex eiy
u(x, y) = ex (cos y)
v(x, y) = ex sin y
d
sin(x) = cos(x)
dx
d
cos(x) = − sin(x)
dx
Thus, we have
ux (x, y) = ex (cos y)
30
vx (x, y) = ex sin y
• Therefore we have:
d(ez )
= ex (cos y) + iex sin y
dz
d(ez )
= ex (cos y + i sin y)
dz
Thus, we have:
d(ez )
= ex eiy
dz
d(ez )
= ex+iy
dz
Recall: z = x + iy, so:
d(ez )
= ez
dz
• As the derivative it the natural exponential function, it is defined for all z ∈ C. To double-check, we
proceed to check the Cauchy-Riemann equations
• We proceed to check the Cauchy-Riemann equations:
ux (x, y) = vy (x, y)
• We have that:
uy (x, y) = ex (− sin y)
vy (x, y) = ex (cos y)
• Thus we have that that the Cauchy-Riemann equations hold for any number of x and y:
31
• Therefore,
1. the natural exponential of z is complex differentiable for all z ∈ C
2. the natural exponential of z is analytic-regular-holomorphic for all z ∈ C
1
f (z) =
1 − αz
• When is it undefined? If z ∈ R, we know it is not defined when z = 1
α
1
f (z) =
1 − αz
• As usual, we would like to find a way to rewrite f (z) as
Thus,
1
f (z) = eln 1−αz
f (z) = e− ln (1−αz)
– The natural exponential function, ez , is complex differentiable for all z ∈ C
– However, the natural logarithm, ln z, is not complex differentiable for all z ∈ C
• so, we only need to analyze the differentiability of ln (1 − αz)
32
• To get a single-value function, we analyze the principal logarithm:
|1 − αz| > 0
1 ̸= αz
That is,
1
z ̸=
α
• To find the derivative we can follow the same analysis as in 2.1.2 by analyzing:
, in which:
z ∗ = (1 − αx) + i(−αy)
Re(z ∗ ) ≤ 0
1 − αx ≤ 0
1
x≥
α
Recall that z = x + iy, so the real part of z is x. From above we have that:
1
x≥
α
1
Re(z) ≥
α
33
The imaginary part being zero in z ∗ is the same as the imaginary part being zero in z. So, by rewriting
L in terms of z, we have that:
1
L = {z ∈ C : Re(z) ≥ and Im(z) = 0}
α
Thus, Ln(1 − αz) is well defined and continuous for all z ∈ C\L
• We know that Ln z ∗ is complex differentiable for all z ∗ ∈ C\Lz∗ .
• Thus, we have that Ln(1 − αz) is complex differentiable for all z ∈ C\L
• That is, Ln(1 − αz) is analytic for all z ∈ C\L
• Therefore, f (z) = 1
(1−αz) is analytic for all z ∈ C\L
1
f (z) =
1 − αz
• is well defined
• is analytic
• As f (z) is analytic everywhere inside C\L, we can use Taylor’s Theorem to express it as a power series
with z0 ∈ C\L:
∞
X n
f (z) = an (z − z0 )
n=0
∞
X f (n) (z0 )
(z − z0 )n
n=0
n!
34
• We know that g(z) is analytic everywhere inside C\L, so we can use the Taylor expansion around
z0 = 0:
α α2 2 α3 3 α4 4 αn n
ln(1 − αz) = − z 1 − z − z − z ··· − z + ...
1 2 3 4 n
So,
∞
X αn n
ln(1 − αz) = − z
n=1
n
• To apply the Ratio test, we need to find the limit of the absolute ratios of consecutive terms, L:
αn+1 n+1
n+1 z
L = lim αn n
n z
n→∞
αn
L = lim z
n→∞ n+1
I can take out |αz| since it does not depend on n, so it is not affected by the limit:
n
L = |αz| lim
n→∞ n+1
1
L = |αz| lim
n→∞ 1
L = |αz||1|
L = |αz|
L<1
|αz| < 1
f ′ (x)
: lim f (x) = lim g(x) = 0 or ± ∞, and g ′ (x) ̸= 0 for all x in I with x ̸= c, and lim exists, then
5
x→c x→c x→c g ′ (x)
f (x) f ′ (x)
lim = lim ′
x→c g(x) x→c g (x)
35
That is, if:
|α||z| < 1
• Thus, we have that the series g(z) and therefore the series f (z) converge for any z such that |z − 0| <
|α|−1 .
• One big difference between this function and 1
1−z is that the convergence can be achieved when |z| = 1
(i.e., z is on the unit circle) as long as
|a| < 1
This is an extremely important result as we will see when analyzing the lag operator.
3 Lag polynomial
• A lag polynomial is a polynomial in which the variable is the lag operator denoted by L
• The lag operator or backshift operator, B, operates on an element of a time series to produce the
previous element:
Lyt = yt−1
• The lag operator (as well as backshift operator) can be raised to arbitrary integer powers so that:
L2 yt = yt−2
L−1 yt = yt+1
• The lag polynomial will be useful when analyzing the properties of a time series.
• For
∞
X
ψj Xt−j
j=−∞
∞
X
= ψj Lj Xt
j=−∞
= ψ(L)Xt
36
• As it is a polynomial, we will use all the tools learned in Section 1 and 2. So we would like to see for
instance if we can apply Taylor’s Theorem:
∞
X 1
ψ(L) = ψj Lj =
j=0
1 − ρL
• As the time series is a random variable, we will need to use the concept of convergence of random
variables that we will learn in Section 4.
• The idea is to find the conditions for the lag polynomial such that a time series is stationary
• In short, we will try to express a times series in terms of a lag polynomial. As the time series is a
random variable, we will need to analyze the convergence in probability or in moments. As we will see
in Section 5, certain features of the moments of the variable can tell us if the variables is stationary
(e.g., second moment convergence is a necessary but not a sufficient condition for the series to be
stationary). Then, we will come up with what conditions we need for the lag polynomial such that
the time series is stationary
37
• For example, the sequence
1 2 3 n
, , ,··· , ,···
2 3 4 n+1
is defined as
n
an = , for n = 1, 2, 3, · · ·
n+1
lim an = L
n→∞
38
4.3 Convergence in Probability
• Let {Xn : n ≥ 1} be a a random sequence of vectors on Rk
• That is, k elements in each vector. That is, k random variables.
• When k = 1, we are in the simple case of a sequence of random variables (Each vector is one dimensional,
which means that each vector is only one variable)
• Let X be a random vector on Rk
• Xn is said to converge in probability to X, denoted by:
p
Xn → X,
if, as n → ∞
P (|Xn − X| > ε) → 0
for any
ε>0
where | · | is the usual Euclidean norm (also called the L2 norm), which gives the ordinary distance
from the origin to the point X—a consequence of the Pythagorean theorem:
q
|X| = x21 + · · · + x2k
√
, in which · denote the positive root
• Equivalently, Xn is said to converge in probability to X, if
for any
ε>0
• Equivalently, Xn is said to converge in probability to X, if for any δ, ε > 0, there exists N ∈ N such
that,
P (|Xn − X| > ε) ≤ δ
for all n ≥ N
• It is worth pointing out that Markov’s inequality is very useful in many proofs involving convergence
in probability.
39
4.3.1 Some intuition
• Let’s say X is a constant, something finite
• Let’s sat Xn is simple sum:
n
X
Xn = xi
i=1
40
4.4 Convergence in probability of a Infinite Sum of Random Variables: Abso-
lutely Summable
Theorem 3 If {Xt } is any sequence of random variables such that
and if
∞
X
|ψj | < ∞,
j=−∞
∞
X ∞
X
Yt = ψ(L)Xt = ψj Lj Xt = ψj Xt−j
j=−∞ j=−∞
converges.
Proof
• We will skip this one since Theorem 4 implies Theorem 3
Lq
Xn → X
if, as n → ∞
q
E [|Xn − X| ] → 0
• By Jensen’s inequality we can show that convergence in higher moments imply convergence in lower
moments
L1
Xn → X
if, as n → ∞
E [|Xn − X|] → 0
41
• Let’s say X is something finite, then we have that Xn converges in first moment to something finite
• Thus, we have that:
E [|Xn |] < ∞
as n goes to infinity.
• Recall that we define expectations by sums or integrals only if they are absolutely summable or
integrable.
• Therefore, we have that a random variable Y has expectation only if |Y | has expectation:
X
E[Y ] = yp(y) < ∞
y
if and only if
X
E [|Y |] = |y|p(y) < ∞
y
E [Xn ] < ∞
as n goes to infinity.
L2
Xn → X
if, as n → ∞
h i
2
E |Xn − X| → 0
• Let’s say X is something finite, then we have that Xn converges in second moment to something finite.
• Thus, we have that the second moment of Xn exists.
• This convergence carries out a great implication: the variance exist.
• 2nd moment convergence implies 1st moment convergence. Thus, the first moment is also finite.
• The variance of Xn is given by:
By definition of the expectation, we know that the expectation of Xn2 exists if and only if the expectation
2
of |Xn | exists. Thus, we have that the first component of the variance exists. The second component
42
exists because convergence of second moment implies the expectation of |Xn | exists which implies that
expectation of Xn exists and so the square of this expectation.
2
sup E |Xt | < ∞
t
and if
∞
X
|ψj | < ∞,
j=−∞
∞
X ∞
X
Yt = ψ(L)Xt = ψj Lj Xt = ψj Xt−j
j=−∞ j=−∞
4.6.2 Proof
• In this case we want to show that the infinite sum converges in mean square.
• That is, we want to show that the infinite sum converges in second moment to something finite.
• We will denote that something finite as K.
• Thus, we want to check if the the following holds:
2
n
X
ψj Xt−j − K → 0
E
j=−n
as n goes to infinity.
• This trick is the same use in Proposition 3.1.1. in Brockwell and Davis (1991)
43
• Let’s say K is the finite variance if it exists. So we are actually only requiring
2
∞
X
ψj Xt−j = Finite
E
j=−∞
Now recall the definition of having a finite second moment: E|Z|2 < ∞.
Notice that E|Z|2 = E(Z)2 only for Z ∈ R.
If Z ∈ C, then we cannot say that because of the definition of a modulus.
I will explain it in more detail in the following lines.
Thus, we are looking for:
2
∞
X
ψj Xt−j < ∞
E
j=−∞
Again, we would love to use here the beautiful Triangle inequality, but this time we have the
Euclidean norm to the power of 2, so we cannot.
For this case, we will use the beautiful property of the |.|2
• Recall that the Euclidean norm or L − 2 norm (or distance) of a k-dimensional vector x is given by
v
u k
uX
|x| = t x2 i
i=1
, in which xi denotes an element of the k-dimensional vector. Thus, the square of the L-2 norm is:
v
u k 2
uX k
X
|x|2 = t x2i = x2i
i=1 i=1
since:
|z|2 = z z̄
44
• Notice that in this case, we have a one dimensional vector.
Let’s allow for the random variable and for the coefficients to be complex numbers.
So using the definition of Euclidean norm is the L − 2 norm, we have that a direct application of the
above:
2
n
X n
X n
X
ψj Xt−j = E ψj Xt−j ψ̄j X̄t−j
E
j=−n j=−n j=−n
n
X n
X
= E ψj ψ̄k Xt−j X̄t−k
j=−n k=−n
n
X n
X
= ψj ψ̄k E Xt−j X̄t−k
j=−n k=−n
Notice that the last term looks like a covariance so we would like to use the Covariance Inequality.
It is not a covariance because it does not include the means.
• The Covariance Inequality for real random variables states that:
, which comes directly by applying Cauchy-Schwarz inequality, which can be written using only
the inner product:
or by taking the square root of both sides of the above inequality, the Cauchy–Schwarz inequality it
can be written using the norm and inner product :
|⟨X, Y ⟩| ≤ ∥X∥∥Y ∥.
⟨X, Y ⟩ = E(XY )
p p
∥X∥ = ⟨X, X⟩ = E(XX)
|E(XY )|2 ≤ E X 2 E Y 2
but we can easily redefine x and y such that µ = E(X) and ν = E(Y ), then:
45
| Cov(X, Y )|2 = |E((X − µ)(Y − ν))|2
= |⟨X − µ, Y − ν⟩|2
≤ ⟨X − µ, X − µ⟩⟨Y − ν, Y − ν⟩
= E (X − µ)2 E (Y − ν)2
= Var(X) Var(Y )
For the case of complex random variables, we also proceed to use to use the Cauchy-Schwarz
inequality when defining the inner product as6 :
⟨Z, W ⟩ = E(ZW )
so,
So,
p
|E[Z W̄ ]| ≤ E [|Z|2 ] E [|W |2 ]
n
X n
X
ψj ψ̄k E Xt−j X̄t−k
j=−n k=−n
n
X n
X
ψj ψ̄k E Xt−j X̄t−k
j=−n k=−n
Now by applying the Cauchy-Schwarz inequality for complex random variables, we have that:
n
X n
X n
X n
X
r h i h i
2 2
ψj ψ̄k E Xt−j X̄t−k ≤ ψj ψ̄k E |Xt−j | E |Xt−k |
j=−n k=−n j=−n k=−n
46
h i h i
2 2
We would like to have that Xt is i.i.d with E|Xt |2 = θ < ∞, so E |Xt−j | = E |Xt−k | = θ and we
could factor it out of the sum.
However, we do not have an i.i.d process, since this is a more general case, but we do have that:
2
sup E [|Xt |] < ∞
t
, which means that the max possible value for E|Xt |2 for any t is finite.
Thus, we can assume as an extreme case that all the E|Xt |2 take that extreme value value:
n
X n
X
r h i h i n
X n
X r
2 2 2 2
ψj ψ̄k E |Xt−j | E |Xt−k | ≤ ψj ψ̄k sup E [|Xt |] sup E [|Xt |]
t t
j=−n k=−n j=−n k=−n
n
X n
X
r h i h i n
X n
X
2 2 2
ψj ψ̄k E |Xt−j | E |Xt−k | ≤ ψj ψ̄k sup E [|Xt |]
t
j=−n k=−n j=−n k=−n
so we can factor out the expectation of the infinite sum (since we are assuming all the expectations
are taking the supreme value, so it is no longer indexed to the infinite sum):
n
X n
X
r h i h i n
X n
X
2 2 2
ψj ψ̄k E |Xt−j | E |Xt−k | ≤ sup E [|Xt |] ψj ψ̄k
t
j=−n k=−n j=−n k=−n
2
n
X n
X n
X n
X n
X
ψj = ψj ψ̄j = ψj ψ̄k
j=−n j=−n j=−n j=−n k=−n
2
n
X n
X
r h i h i n
X
2 2 2
ψj ψ̄k E |Xt−j | E |Xt−k | ≤ sup E [|Xt |] ψj
t
j=−n k=−n j=−n
n
X n
X
r h i h i n
X n
X
2 2 2
ψj ψ̄k E |Xt−j | E |Xt−k | ≤ sup E [|Xt |] ψj ψj
t
j=−n k=−n j=−n j=−n
by Triangle Inequality:
n
X n
X
ψj ≤ |ψj |
j=−n j=−n
Thus,
47
n
X n
X n
X n
X
2 2
sup E [|Xt |] ψj ψj ≤ sup E [|Xt |] |ψj | |ψj |
t t
j=−n j=−n j=−n j=−n
2
n
X n
X n
X
2 2
sup E [|Xt |] ψj ψj ≤ sup E [|Xt |] |ψj |
t t
j=−n j=−n j=−n
2
∞
X
2
sup E [|Xt |] |ψj |
t
j=−∞
and
∞
X
|ψj | < ∞
j=−∞
, which implies
2
∞
X
|ψj | < ∞
j=−∞
Therefore, we have:
2
∞
X
2
sup E [|Xt |] |ψj | < ∞
t
j=−∞
2.
n
X n
X
r h i h i
2 2
ψj ψ̄k E |Xt−j | E |Xt−k | < ∞
j=−n k=−n
3.
n
X n
X
r h i h i
2 2
ψj ψ̄k E |Xt−j | E |Xt−k | < ∞
j=−n k=−n
48
4.
n
X n
X
ψj ψ̄k E Xt−j X̄t−k < ∞
j=−n k=−n
5.
n
X n
X
ψj ψ̄k E Xt−j X̄t−k < ∞
j=−n k=−n
6.
2
n
X
ψj Xt−j < ∞
E
j=−n
2
n
X
lim E ψj Xt−j = Finite
n→∞
j=−n
or
n
L2
X
ψj Xt−j → X
j=−n
• Thus, we have that the infinite sum converges in mean square/second moment/L-2
• Thus, it converges in the first moment, which implies convergence in probability.
P∞ 2
• Notice that for the variance of the infinite sum to be finite we haven’t needed j=−∞ |ψj | < ∞, it
P∞ 7
was enough to have j=−∞ |ψj | < ∞ as long as the second moment of Xt is finite.
49
Theorem 5 If {Xt } is any sequence of random variables such that
2
sup E |Xt | < ∞
t
and if
∞
X 2
|ψj | < ∞,
j=−∞
∞
X ∞
X
Yt = ψ(L)Xt = ψj Lj Xt = ψj Xt−j
j=−∞ j=−∞
4.7.2 Proof
For a more advanced time series econometric course!
2
E |Xt | < ∞
50
2. For all t ∈ Z, the mean is independent of t:
EXt = m
The first condition, implies that first moment exists. However this condition requires it to be indepen-
dent of t and constant over time, which implies that the mean is the same for all Xt , ∀t ∈ Z
3. Let’s define the autocovariance function as a function of two variables:
Then, for all r, s and t ∈ Z, the autocovariance between Xt and Xt+s must only be a function of the
distance between the two observations, s:
γx (t, t + s) = g(s)
, such that:
since the the covariance between Xr+t and Xs+t is only a function of the distance between both
observations, s − r, which is the same as the distance between the observations Xr and Xs . Since both
covariance have the same distance, then they should be equal in order for the process to be stationary.
γx (r, s) = γx (r − s, 0)
for all r, s ∈ Z.
• Thus, it is convenient to redefine the autocovariance function of a stationary process as the function
of just one variable:
for all t, h ∈ Z
• The function γx (·) will be referred to as the autocovariance function of {Xt }
• γx (h) as the value of the autocovariance function of {Xt } at lag h.
• The autocorrelation function (acf) of {Xt } is defined analogously as the function whose value at lag h
is:
γx (h)
ρx (h) ≡ = Corr (Xt+h , Xt )
γx (0)
for all t, h ∈ Z
51
5.4 Complex-Valued Stationary Time Series
• Processes encountered in practice are nearly always real-valued
• However, it is mathematically simpler in spectral analysis to treat them as special cases of complex-
valued processes.
The process {Xt } is a complex-valued stationary process if
1. For all t ∈ Z, the second moment exists:
2
E |Xt | < ∞
E[Xt ] = m
Given that for the process to be stationary we need condition 2: EXt+h = E X̄t = m, then for condition
3, we only need:
E Xt+h X̄t to be independent of t
This is what makes it different from our usual definition with real-valued numbers: the fact that when
dealing with complex-valued random variables, the covariance formula requires the conjugate of the
variables.
γ(h) = E Xt+h X̄t − EXt+h E X̄t
for all t, h ∈ Z
• The function γx (·) will be referred to as the autocovariance function of {Xt } and has the following
properties8 :
γ(0) ≥ 0
|γ(h)| ≤ γ(0) for all integers h
γ(·) is a Hermitian function (i.e. γ(h) = γ(−h))
8A Hermitian function is a complex function with the property that its complex conjugate is equal to the original function
with the variable changed in sign: f (x) = f (−x)
52
5.6 Stationarity of an Infinite sum of random variables (Real-valued/complex-
valued)
Theorem 6 If
1. {Xt } is a stationary process with autocovariance function γX (·)
2.
∞
X
|ψj | < ∞,
j=−∞
3.
∞
X
Yt = ψj Xt−j
j=−∞
Then, then for X ∈ R, the process {yt } is stationary with autocovariance function:
∞
X
γY (h) = ψj ψk γX (h − j + k)
j,k=−∞
For X ∈ C with zero mean, the process {Yt } is stationary with autocovariance function:
∞ X
∞
X
γY (h) = E Yt+h Ȳt = ψj ψ̄k γX (h − j + k), h = 0, ±1, . . .
j=0 k=0
5.6.1 Proof
• If {Xt } is stationary, we have that
E |Xt | = c
where c is finite and independent of t, and that its second moment is finite. Thus we can apply Theorem
3 and 4
• From the Theorem 3 and Theorem 4, we know that
∞
X
ψj Xt−j
j=−∞
converges in second and first moment. Thus, Yt has a finite second moment, which is the first require-
ment to be stationary.
• Now we need to check its first moment. We know that its first moment is finite, but is it also time
independent?:
n
X ∞
X ∞
X
E[Yt ] = lim ψj EXt−j = ψj E[Xt ] = ψj c
n→∞
j=−n j=−∞ j=−∞
Yes, it is!. Notice that finding its mean is much more simpler than Theorem 3 because we are not
finding E|Yt |. We know it exists so E[Yt ] must exist, and we just have to find it in the usual easy way.
53
• Notice that we could also find the variance of Yt by finding the variance of the infinite sum, which is
straightforward and easy given that in this case Var(Xt ) = σ 2
• Now, we just have to find the autocovariance function for Yt :
!
n
X n
X
E (Yt+h Yt ) = lim E ψj Xt+h−j ψk Xt−k
n→∞
j=−n k=−n
∞
X
2
= ψj ψk γX (h − j + k) + (E[Xt ])
j,k=−∞
Thus, E[Yt ] and E (Yt+h Yt ) are both finite and independent of t. The autocovariance function γY (·) of
{Yt } is given by
∞
X
γY (h) = E (Yt+h Yt ) − E[Yt+h ]E[Yt ] = ψj ψk γX (h − j + k)
j,k=−∞
6 Spectral Representation
6.1 Frequency Domain
• Frequency is the number of occurrences of a repeating event per unit of time (e.g. beats per second).
• The frequency domain refers to the analysis of mathematical functions or signals with respect to
frequency, rather than time.
• In signal processing, a signal is a function that conveys information about a phenomenon. In electronics
and telecommunications, it refers to any time varying voltage, current or electromagnetic wave that
carries information
54
• Thus, what you get is a wave-ish pressure versus time graph because it is the combination of two pure
frequencies.
• However, the resulting series it is not a pure sine wave.
• Now, imagine that you add more notes, then the resulting series is even more complicated.
• Recall we only see the final series, so is there a way to decompose a final signal (series) into the pure
frequencies that make it up?
• Well, for that we need a a mathematical machine that treats signals with a given frequency differently
from how it treats other signals.
• An example is the Fourier transform, which converts a time function into a sum or integral of sine
waves of different frequencies, each of which represents a frequency component.
6.3 Advantages
• The simplification of the mathematical analysis.
55
on the circle, and there are 6 times that the vector reaches it maximum length. So, a single rotation
around the circle lasts 2 seconds
• Let’s define a cycle as the rotation of the vector on the circle. Thus, in this case we have that the
vector on the circle is rotating at 0.5 cycle per second (or 1 cycle per 2 seconds, i.e., 1 rotation every
2 seconds.)
• Thus, at the moment we have two frequencies.
• There’s the frequency of our signal, which goes up and down, three times per second. And then,
separately, there’s the frequency with which we’re wrapping the graph around the circle, which at the
moment is half cycle per second.
56
• Of course, the center of mass is a two-dimensional thing, and requires two coordinates to fully keep
track of, but for the moment, let’s only keep track of the x coordinate.
• From Figure 13, 14, 15, 16, 17 and 18, we can observe that:
– For a frequency of 0, when everything is bunched up on the right, this x coordinate is relatively
high
– Then, as you increase that winding frequency, and the graph balances out around the circle, the
x coordinate of that center of mass goes closer to 0 and it just kind of wobbles around a bit.
– But then, at three beats per second, there’s a spike as everything lines up to the right.
• Notice that in this example we have that the original series is not around zero. This is why we have
a big number for frequency zero (The cosine wave is shifted up). If we allow the original series to be
around zero, then the spike for frequency zero does not show up and we only have a spike on frequency
3. See Figure 19
• Thus, we can clearly see how the winding frequency affects the center of mass
• Those graphs with the center of mass as a function of the winding frequency is an almost Fourier
Transform of the original signal
• This is super important! Imagine having as original series the sum of a 3 and a 2 beats per second
signals. Then, if we only look at the final series, we are not able to identify the pure signals. However,
if we are able to wrap it up in around circle and then make a graph of its center of mass as a function
of the winding frequency, we will be able to detect the spikes of the pure signals. See Figure 20
• Now what’s going on here with the two different spikes, is that if you were to take two signals and
then apply this Almost-Fourier transform to each of them individually, and then add up the results,
what you get is the same as if you first added up the signals, and then applied this Almost-Fourier
transform. See Figure 21
6.4.5 Winding the original series around a circle on the Complex Plane
• Now, what is missing for the Fourier Transform?
• Well, until now we were focusing of the x value of the center of mass. However, the circumference is
graphed in a two dimensions plane. So what about the y value of the center of mass?
• Well, we can think of it as the Complex Plane. Then, the center of mass is a complex number that
has both real and imaginary part
• Complex numbers lend themselves to really nice descriptions of things that have to do with winding
(a twisting movement or course) and rotation.
• For example, Euler’s Formula tells us that if you take e to the power of some real number times i,
z = eiθ
, you’re gonna land on the point that you get if you were to walk that number of units around a circle
with radius 1, counterclockwise starting on the right. See Figure 22
• So, imagine you wanted to describe rotating at a rate of one cycle per second:
– Recall that π = 3.1416 . . . and the the full length of circumference of a circle with radius 1 is 2π.
Thus we have that 1 cycle can be denoted by 2π.
– Then, you could start by taking the expression e2πti ,where t is the amount of time that has passed
57
– Then, decompose t in f times t so f captures the winding frequency. If f = 1, then we are in the
case of 1 cycle per unit time t (second, in this example)
1
– For example, if f = 10 , this vector makes one full turn every 10 seconds since the time has to
increase all the way to 10 before the full exponent looks like 2πi. So, we have 1 cycle every 10
seconds or 1/10 cycle every 1 second
• Remember that the idea is to wrap up the series around a circle on the complex plane
• We can make it by using e2πf ti
• Let’s say that the original series can be described by g(t), then we have that the wrapping up around
the circle on the complex plane is given by:
or
See Figure 23
• This is amazing, since this really small expression is a super elegant way to encapsulate the whole idea
of winding a graph around a circle with a variable frequency f .
• Now, remember we want to express the measure of mass a function of the frequency. How can we
obtain the center of mass?
• Well, we need to consider all the observed data (i.e., for all time) and let the only exogenous variable
in the function ϕ(.) be the frequency per unit of time.
• Thus, we could take the sum (for discrete time) or integral (for continuous time) of that expression
when the number of observations goes to infinity.
Figure 6: Pure signal with 3 beats per second on the time domain
∞
X
x̃(ω) = xt e−itω
t=−∞
, in which
• ω ∈ (−π, π]:
58
Figure 7: Wrapping the pure signal with 3 beats per second up around a circle
Figure 8: Wrapping the pure signal with 3 beats per second up around a circle faster
Figure 9: Wrapping the pure signal with 3 beats per second up around a circle slower
59
Figure 10: The winding frequency matches the frequency of our signal (three beats per second)
Figure 12: The center of mass when the winding frequency matches the frequency of our signal
Figure 13: Graph of the center of mass as function of the winding frequency
60
Figure 14: Graph of the center of mass as function of the winding frequency
Figure 15: Graph of the center of mass as function of the winding frequency
Figure 16: Graph of the center of mass as function of the winding frequency
Figure 17: Graph of the center of mass as function of the winding frequency
Figure 18: Graph of the center of mass as function of the winding frequency
61
Figure 19: The original cosine wave when centered around zero
62
Figure 21: Decomposing the pure signals of an accumulated signal
63
Figure 23: Using Euler’s Formula to get the winding version of the original signal
– This is to put an explicit boundary to the frequency: we are limited to have something between
0 and 1 cycle, which in this case is 2π radians.
– Why? When we observe economic data for any unit of time that means that each point changes
every time t changes.
– That is, we cannot have a movement between t=1 and t=2 since by construction there is nothing
being measured there.
– For example, when observing GDP for t in months: when t goes from Dec2000 to Jan 2001, there
is no change in the value of GDP between those two units of time because it is not being measured.
– However if we go to the example of the music tone, we have that the air pressure reaches its max
3 times between t=1 second and t=2 seconds.
– Thus, we have 3 beats per second and eventually 3 cycles per second. That cannot happen with
economic data. Let’s imagine t=1,2,3,4. Then if GDP is 1000, 2000, 3000, 2000, we have that it
reaches its max every 3 months, so eventually we will have 1/3 cycles per month (or 2π/3 radians
per unit of time).
– Now, if GDP is 1000, 2000, 1000, 2000, we have that it reaches its max every 2 months, so
eventually we will have 1/2 cycles per month (or 2π/2 radians per unit of time).
– Thus, by construction when looking at economic data for any frequency of time, we will not have
something like 1,2,3,... cycles per second.
– As much we can have 1/2 cycles per unit of time (2π/2 radians per unit of time). Thus, having
ω ∈ (−π, π] makes a lot of sense.
• ω is the winding frequency but in a different measure:
ω = 2πf
64
• Any given, fixed value to ω says how quickly e−it circumnavigates the unit circle. For instance, ω = 2π
means that a full rotation of the unit circle in 1 unit of t, that is 1 cycle in one unit of time. This is
the same as f = 1, so we get ω = 2π, which means 1 cycle per unit of time.
• We can read it as ω radians per unit of time, which doest not imply a full rotation or cycle. It is a full
rotation when ω = 2π.
• So if the unit of t is a quarter:
– ω=π:
∗ One pi radian per unit of time or half cycle per unit of time.
∗ This is like in the previous subsection when using f = 1/2, so we obtain the same exponent:
πti.
∗ That is 1 cycle every 2 units of time, which is equivalent to 0.5 cycle every 1 unit of time.
∗ If the true g(t) is such that xt = g(t) = cos(ωt), then we have that it has 1 beat every two
quarters.
∗ This means alternations every unit of time: in t we are on the valley, in t + 1, we are on the
peak, in t + 2 we are in the valley again.
– ω = π/2:
∗ half pi radians per unit of time.
∗ That is, 0.25 cycle per unite of time.
∗ This is like in the previous subsection using f = 1/4, so we obtain the same exponent: π
2 ti.
∗ That is 1 cycle every 4 units of time, which is equivalent to 0.25 cycle every 1 unit of time.
∗ If the true g(t) is xt = cos(ωt), we have that it has 1 beat every 4 quarters.
∗ So we have annually recurring events.
• Recall, by Euler’s formula:
z = e−iω
|z| = 1
∞
X
x̃(ω) = xt z t
t=−∞
65
, with z = e−iω over ω ∈ (−π, π]
6.6.1 Proof
• This follows from writing out the right side
Z π Z π
1 1 X 1
e(ω)eiωj dω =
x xt e−iω(t−j) dω = (2π)xj = xj
2π −π 2π −π t
2π
∞
1 X
x̄(ω) = xt e−itω
2π t=−∞
1
6.7.2 Splitting the factor of 2π evenly between the Fourier transform and its inverse
• The Fourier Transform:
∞
1 X
x̄(ω) = √ xt e−itω
2π t=−∞
66
6.8 Lag Operator Calculus and Fourier Transforms
• Consider
yt = h(L)εt
• For any form of h(L), we have that the Fourier Transform of yt is given by:
• So, the Fourier Transform of yt is given by the lag polynomial evaluated at the unit circle times the
Fourier Transform of εt
• How is this possible? In the following lines I will show it
6.8.1 Case 1
• Consider
h(L) = h0 + h1 L
yt = h0 εt + h1 εt−1
∞
X
ỹ(ω) = yt e−itω
t=−∞
∞
X
ỹ(ω) = (h0 εt + h1 εt−1 ) e−itω
t=−∞
∞
X ∞
X
ỹ(ω) = h0 εt e−itω + h1 εt−1 e−itω
t=−∞ t=−∞
TRICK: We will try to put the second term such that it is indexed to t − 1
∞
X ∞
X
ỹ(ω) = h0 εt e−itω + h1 εt−1 e−itω e−iω eiω
t=−∞ t=−∞
∞
X ∞
X
ỹ(ω) = h0 εt e−itω + h1 e−iω εt−1 e−itω+iω
t=−∞ t=−∞
∞
X ∞
X
−itω
ỹ(ω) = h0 εt e + h1 e−iω εt−1 e−iω(t−1)
t=−∞ t=−∞
67
∞
X ∞
X
ỹ(ω) = h0 εt e−itω + h1 e−iω εt−1 e−iω(t−1)
t=−∞ t=−∞
∞
X
ε̃(ω) = εt e−itω
t=−∞
Given that the definition of a Fourier Transform implies that we are using all values of t (i.e., it goes
from negative infinity to positive infinity), then the we simply change the indexation in the infinite
sum from t to t − 1 and it will still be the Fourier Transform of yt :
∞
X
ε̃(ω) = εt−1 e−i(t−1)ω
t=−∞
h(e−iω ) = h0 + h1 e−iω
That is, we have that the lag polynomial h(L) is being evaluated at the unit circle given that e−iω is
the exponential form of a complex number such that its modulus equals 1:
(1 − ρ(L))yt = ϵt
ϵt = (1 − ρ(L))yt
68
ϵt = h(L)yt
1 − ρ e−iω
ỹ(ω) = ϵ̃(ω)
yt = θ(L)ε
That is, we can express the autocovariance function of Xt as a function of only one variable, s, instead
of two variables (r,r + s). This is key because it will allow us to generate a new time series.
• Since s denotes the autocovariance function of Xt at lag s, we could generate a new time series γx (s)
for s = 0, 1, 2, 3, 4, 5, 6, . . . .
• It is a time series because it is indexed to s, which is a unit of time (e.g., depending of the unit of time
for t, the lag can be in terms of years, months, quarters, seconds).
• Changing a bit the notation we could have the new time series:
γx (j) ≡ γj
for j = 0, 1, 2, 3, 4, 5, 6, . . . .
69
• Thus, we could also have the Fourier transform of γj :
∞
1 X
γ̃(ω) = γj e−ijω
2π j=−∞
, in which ω ∈ (−π, π] represents the frequency. This is known as the Spectral density of Xt
• This is extremely useful, since from the Spectral density of Xt we are able to find each autocovariance
of Xt , γj , by using the Fourier Inverse.
γj = E [xt x̄t−j ] = γ −j
• Then, the Spectral Density of xt is given by the Fourier Transform of the autocovariance function
of xt :
∞
X
sx (ω) = γ̃(ω) = γj e−ijω
j=−∞
sx (ω) = sx (−ω)
6.9.3 Approximation of the spectral density by the second moment of the Fourier transform
• Notice that:
∞ ∞
" #
X X
−itω
E[x̃(ω)x̃(ω)] = E xt e x̄t e−itω
t=−∞ t=−∞
70
h i
=E x−1 e−i(−1)ω + x0 e−i(0)ω + x1 e−i(1)ω x̄−1 e−i(−1)ω + x̄−0 e−i(−0)ω + x̄1 e−i(1)ω
= E[x−1 x̄−1 e−i(−1)ω e−i(−1)ω + x−1 x̄0 e−i(−1)ω e−i(0)ω + x−1 x̄1 e−i(−1)ω e−i(1)ω +
x0 x̄−1 e−i(0)ω e−i(−1)ω + x0 x̄0 e−i(0)ω e−i(0)ω +
x0 x̄1 e−i(0)ω e−i(1)ω + x1 x̄−1 e−i(1)ω e−i(−1)ω +
x1 x̄0 e−i(1)ω e−i(0)ω + x1 x̄1 e−i(1)ω e−i(1)ω ]
Since when z = x + iy, we have that z = x − iy, then when z = e−i(−1)ω , we must have that
z = e−i(−1)ω = ei(−1)ω . Further, recall zz = |z|2 ; for z = e−i(−1)ω , we have |z| = 1. Finally, notice
that e−i(0)ω = e0 = 1
Thus, it is approximate:
2
∼
X
= γj e−ijω
j=−2
Therefore, for a more general case with infinite observations for xt , the Spectral density of xt can
be approximated by:
∞
X
sx (ω) = γ̃(ω) = γj e−ijω
j=−∞
∼
= E[x̃(ω)x̃(ω)]
Thus, we can interpret sx as spreading out x ’s variability across the range of frequencies ω ∈ (−π, π].
71
Example: White Noise
• Suppose
xt = εt
σ2
for j = 0
γj =
0 otherwise
∞
X
sx (ω) = γ̃(ω) = γj e−ijω = γ0 = σ 2
j=−∞
Example: MA(1)
• Suppose
xt = εt + θ1 εt−1
γj = 0 for j > 1
∞
X
sx (ω) = γj e−ijω
j=−∞
72
sx (ω) = γ0 + 2γ1 cos ω
xt = ρxt−1 + εt
, where εt is a white noise with variance σ 2 and |ρ| < 1 so that xt is stationary.
• Thus, we have that for xt :
σ2
γj = ρ|j|
1 − ρ2
∞
X
sx (ω) = γj e−ijω
j=−∞
∞
X σ2
sx (ω) = ρ|j| e−iωj
j=−∞
1 − ρ2
∞
σ2 X
sx (ω) = ρ|j| e−iωj
1 − ρ2 j=−∞
• Now, it is a more complicated result. However, we can use a property for the lag operator when finding
the spectral density
∞
X
yt = ψj Lj xt = ψ(L)xt
j=0
2
∞
X
−iw
sy (ω) = ψj e sx (ω)
j=0
2
sy (ω) = ψ(e−iwj ) sx (ω)
73
So, the spectral density of yt in terms of the spectral density of xt involves the lag polynomial evaluated
on the unit circle, which implies that the lag polynomial evaluated on the unit circle must converge.
This is why stationarity plays an important role when finding the spectral density of an AR process.
Proof
• Notice that from from Theorem 3 and Theorem 4, we immediately have that yt has a finite second
moment.
• Further, given that xt is stationary, yt is also stationary (Theorem 6).
• Now, let’s allow both variables to be possibly complex valued. Then, since, the autocovariance function
of yt is given by:
∞ X
X ∞
γy (h) = E (yt+h ȳt ) = ψj ψ̄k γx (h − j + k), h = 0, ±1, . . . (9)
j=0 k=0
X ∞
∞ X Z π
γy (h) = ψj ψ̄k sx (ω)eiω(h−j+k) dω
j=0 k=0 −π
Z π ∞ X
X ∞
= sx (ω) ψj ψ̄k eiω(h−j+k) dω
−π j=0 k=0
Z π ∞ X
X ∞
= sx (ω) ψj e−iωj ψ̄k eiωk eiωh dω
−π j=0 k=0
Z π X∞ ∞
X
= sx (ω) ψj e−iωj ψ̄j eiωj eiωh dω
−π j=0 k=0
2
Z π ∞
X
γy (h) = ψj e−iωj sx (ω)eiωh dω (11)
−π j=0
By using the Fourier inverse definition for the Spectral density, it must be that:
Z π
γy (h) = sy (ω)eiωh dω (12)
−π
2
∞
X
sy (ω) = ψj e−iωj sx (ω) , ω ∈ [−π, π] (13)
j=0
74
2
sy (ω) = ψ(e−iωj ) sx (ω) , ω ∈ [−π, π] (14)
Recall |e−iωj | = 1. Thus, we are evaluating ψ(z) on the unit circle: |z| = 1
Case 1: Stationary AR(1)
• Consider
ϵt = (1 − ρL)yt
ϵt = h(L)yt
with h(L) = 1 − ρL
• Then, we immediately have a relationship between the spectral densities:
75
• Given that εt is a white noise with variance σ 2 :
, in which the lag polynomial h(L) is being evaluated at L = e−iω and L = eiω . Given that
|e−iω | = |eiω | = 1, we are analyzing the the lag polynomial on the unit circle. From Section 2.3,
we know that h(L)−1 = (1 − ρL)−1 when evaluated on the unit circle will converge only if |ρ| < 1,
which is the same condition for AR(1) to be stationary.
• Thus, the Spectral density of a stationary AR(1) process is given by:
1
sy (ω) = σ2
1 − 2ρ cos(ω) + ρ2
Case 2: AR(m)
• Consider
ϵt = (1 − ρ1 L − ρ2 L2 − · · · − ρm Lm )yt
(1 − ρ1 L − ρ2 L2 − · · · − ρm Lm ) = (1 − λ1 L)(1 − λ2 L) . . . (1 − λm L)
ϵt = (1 − λ1 L)(1 − λ2 L) . . . (1 − λm L)yt
sϵ (ω) = (1 − λ1 e−iω )(1 − λ1 eiω )(1 − λ2 e−iω )(1 − λ2 eiω ) . . . (1 − λm e−iω )(1 − λm eiω )sy (ω)
76
• Given that the process is stationary, we can put on the LHS all the inverted lag polynomials evaluated
on the unit circle and by using the Euler’s formula as in the previous case we have that the spectral
density of an AR(m) process is given by:
m
Y 1
sy (ω) = σ 2
j=1
1 − 2λj cos(ω) + λ2j
7 AR(P)
In this chapter we will use all the above results to analyze the properties of a time series AR(p)
π(L) = 1 − ρ1 L − ρ2 L2 − · · · − ρp LP
77
• If π(L)−1 exists, we can express Xt as function of εt so we solved the difference equation for Xt .
However, the solution implies a nonlinear function: π(L)−1 .
• Recall that we are interested in finding the properties of Xt (e.g., is it stationary?) and it is impossible
to if we want to analyze:
π(L)−1 εt
For instance, L does not mean anything by its own, it needs to be applied to a time series. Can I apply
it to εt using the expression above? . . . clearly NO! not even for the AR(1) case:
1
εt
1 − ρ1 L
• I know how to deal with L when it is all by itself (to any power) multiplying a time series. Then, the
question is can I express π(L)−1 in such a way that I can get that?
• Luckily for us the answer is YES!. We can take advantage of Taylor’s Theorem so we can express
π(L)−1 as a power series that converges in the largest open disk to π(L)−1 .
• For doing that, we need to check if the π(L)−1 is a regular function.
• If so, we can express π(L)−1 as an infinite sum that converges and we are able to analyze the features
of Xt since we will be able to apply L to εt .
78
• π̃(z) is known as the Reflected polynomial of π(z). Therefore, π(z) is also the Reflected polyno-
mial of π̃(z):
π(z) = z p π̃(z −1 ) (18)
Example:
π(z) = 1 − ρ1 z − ρ2 z 2
π(z −1 ) = 1 − ρ1 z −1 − ρ2 z −2
π̃(z 1 ) = z 2 (1 − ρ1 z −1 − ρ2 z −2 )
π̃(z 1 ) = z 2 − ρ1 z 1 − ρ2
Now,
π̃(z −1 ) = z −2 − ρ1 z −1 − ρ2
π(z) = z 2 (z −2 − ρ1 z −1 − ρ2 )
π(z) = 1 − ρ1 z − ρ2 z 2
Uhlig actually follows the same strategy to define the characteristic polynomial:
π̃(z) ≡ P (λ)
π̃(z) = 0
z p π(z −1 ) = 0
Since those values cannot be zero, then we are finding the values of z such that:
π(z −1 ) = 0
Let’s denote zi as the roots of the characteristic polynomial, π(z). Given this notation, the roots of
π(z −1 ) should be zi−1 since π(z −1 ) is π(z) using the inverse variable. Therefore, we have that:
λi = zi−1 (19)
79
7.4 Is the function a regular one?
• Notice that the lag polynomial is also a function.
• Checking if a complex function is regular follows Section 2.
• However, there is an issue. It gets more complicated to analyze π(z)−1 when p is greater than 1.
• Then, is there any trick to deal with it?
• The answer is YES! We will take advantage of the roots we have found for establishing when π(z)−1
is well defined by applying the Fundamental Theorem of Algebra
P (z) = 0
so, we have that the roots are: z1 = 2, z2 = 1 and z3 = −1. Thus, we can rewrite the polynomial as
follows:
• Further, taking into account that P (x) is also the reflected polynomial of P̃ (x) :
P (x) = x3 P̃ x−1
so,
1
P (x) = x3 (x−1 − )(x−1 − 1)(x−1 + 1)
2
1
P (x) = x(x−1 − )x(x−1 − 1)x(x−1 + 1)
2
We can also express P (x) in terms of the roots of its reflected characteristic polynomial as:
1
P (x) = 1 − x (1 − x)(1 + x)
2
80
• Going back to the general case, we know that λi are the roots of the characteristic polynomial π̃(z).
Thus, λi are the values of z such that π̃(z) = 0
• We know that λi = zi−1 and zi are the roots of the characteristic polynomial π(z)
• Thus, we can also write:
1 1 1 1
π(z) = 1− z 1− z 1 − z ... 1 − z (20)
z1 z2 z3 zp
• This is a nicer way because to know if π(z)−1 is a regular function, we just have to check if each (1−λ
1
i z)
is a regular function: If two functions f (z) and g(z) are analytic in a domain D, then their sum and
their product are both analytic in D. For further on this, check Section 1.10
• Based on our results in Section 2.4, we conclude that the characteristic function π(z)−1 is well defined
and analytic for all z ∈ C\L, in which:
1
L= z ∈ C : Re(z) ≥ and Im(z) = 0
λi
or
• Thus, from Taylor’s Theorem we have that π(z)−1 can be expressed as a power series with z0 ∈ C\L:
∞
X f (n) (z0 ) n
f (z) = (z − z0 )
n=0
n!
where the series converges on any disk |z − z0 | < r contained in C\L. Since 0 ∈ C\L, we select z0 = 0
(largest open disk). Thus, the series converges on the largest open disk only if:
|λi z| < 1
• We know that we can find the values for the roots λi . However, a natural question that emerges at
this point is what is z doing in there. Well, remember that z is just the argument in the characteristic
function and it is originally the variable L. But it doesn’t help too much to know that.I mean what is
the intuition of choosing a value for L. It was supposed to be a trick just to get along with time series.
We know for sure that we need z such that:
z ∈ C\L
and
|λi z| < 1
A set of values for z that satisfies both conditions are those such that |z| = 1. That is, when π(z)−1
is evaluated on the unit circle. We will pick up this because we need it for the Spectral density.
81
7.4.2 Is the characteristic polynomial π(z)−1 analytic on the unit circle, |z| = 1?
• So, we have that π(z)−1 can be expressed as an infinite sum as long as the value of z is such that:
z ∈ C\L
and
|λi z| < 1
∞
X
−1
π(z) = (λ1 z)i
i=0
Notice that for AR(1), λ1 = ρ. Then, by using the inverse lag polynomial. That using L instead of z,
we have:
∞
X
−1
π(L) = (ρL)i
i=0
∞
X
Xt = (ρL)i εt
i=0
∞
X
Xt = ρi εt−i
i=0
∞
X
ρi εt−i
i=0
As this is an infinite sum, we would like to know if this converges or not. Be careful! Having π(z)−1
as a regular function is just to be able to express the function as an infinite sum. However, after
replacing the infinite sum for z = L in the DGP of Xt , we have an infinite sum that involves εt . So, to
fully analyze Xt , we need to analyze the infinite sum involving εt .
• Given that this infinite sum involves a random variable, εt , we know that we are talking about conver-
gence in probability and converges in q − th moment. (See Section 4).
82
• Thus we would like to be able somehow to apply Theorem 3
• We said that we needed to assume z and λi to satisfy the conditions for π(z)−1 to be regular. A set of
values that satisfies both requirements is given by:
|z| = 1
|λi | < 1
∞
X
ρi < ∞
i=0
∞
X
|ρi | < ∞
i=0
Lets’ denote:
ψj = ρj
∞
X
|ψj | < ∞
j=0
The other condition is satisfied because we assumed εt is i.i.d. with zero mean and variance 1. Thus,
it is a particular case of the one in Theorem 3.
• From Theorem 3, we immediately know that converges in the 1st moment and therefore by Markov
inequality it converges in probability.
• Further, from Theorem 4, we immediately know that it converges in the second moment as well.
• Thus, we know that Xt has a finite second moment, which implies a finite first moment.That is, it has
a finite variance and a finite mean.
• In short,
1. We need to be able to express π(L)−1 as an infinite sum to analyze Xt . Otherwise we simply
cannot do it.
2. Thus we need z and |λi | to be such that:
z ∈ C\L
and
|λi z| < 1
83
3. Assuming those, it is not enough given that now we are dealing with an infinite sum that involve
εt
4. If |λ1 | < 1, and given the properties of εt , the conditions in Theorem 3 and Theorem 4 are satisfied
and we conclude that Xt has a finite variance
• However, we would be using the same tools and therefore landing to the same conclusions if we just
proceed to analyze if the characteristic function π(z)−1 is holomorphic on the unit circle and
converges on the largest open disk.
• Thus, to know is Xt converges on the second moment, we just need to check if the roots, λi , in the
characteristic polynomial π̃(z) are such that:
|λi | < 1
2.
∞
X
|ψj | < ∞
j=0
3.
εt is stationary
• The first requirement is to be able to express Xt as an infinite sum, which can be achieved if |λi | < 1
• The second requirement is immediately satisfied if |λi | < 1 given that ψj is just a multiplication of λi
(see Section 7.4.1)
• The third requirement is satisfied by definition of εt
• Therefore, we yield to the same conditions as in the previous section: For Xt to be stationary we only
need to check if the roots, λi , in the characteristic polynomial π̃(z) are such that:
|λi | < 1
84
7.7 Impulse response Function
• Consider the mean zero (de-meaned) weakly stationary AR(1) model.
yt = β1 yt−1 + εt
• We might want to know what we should expect the future value of yt+k to look like if yt were one unit
larger holding all yt−j j > 0 fixed.
• This is equivalent to asking how we should expect yt+k to change given a one unit change in εt .
• WHY?: From the DGP of Yt if we hold yt−1 fixed, the only way to increase yt is through increases in
εt
• The impulse response function is the path that y follows if it is kicked by a single unit shock ϵt , i.e.,
ϵt−j = 0, ϵt = 1, ϵt+j = 0.
• This function is interesting since it allows us to start thinking about ”causes” and ”effects”
• For example, you might compute the response of GDP to a shock to GDP and interpret the result as
the ”effect” on GDP of a shock over time.
• The M A(∞) representation is the same thing as the impulse response function.
• Thus, for a stationary AR(p) process, we know that its stationary solution is given by:
∞
X
yt = ψj εt−j
j=0
This is also known as the M A(∞) representation or the impulse-response function. From this repre-
sentation we can calculate the effect of a single shock overtime.
For instance, the impact of a one unit one unit increase in εt−k on yt is given by:
dyt
= ψk
dεt−k
• When we plot the change in yt given a one unit shock to εt−k we call this an impulse response
function (IRF).
• A stationary process has the property that the effect of a shock does not last forever. That is, if a
shock occurs at t, eventually (as t goes to infinity) ψ∞ will be zero:
dyt+k
= ψk → 0 as k → ∞
dεt
• We can clearly see that with the stationary AR(1) case, in which:
ψj = ρj
ρj → 0 as j → ∞
85
• Interpretation? If there is an unexpected increase of one additional dollar traded, how much does
dollar volume change k-periods ahead?
• Having an IRF vanishing over time is a consequence of being a stationary process. However being
stationary is a sufficient condition but not a necessary one. That is, we can have a process with IRF
vanishing over time that are nonstationary.
yt = ϕyt−1 + εt , where εt ∼ W N 0, σ 2
where ϕ̂ is the estimator of ϕ and SE(ϕ̂) is the usual standard error estimate.
The test is a one-sided left tail test. If {yt } is stationary (i.e., |ϕ| < 1) then it can be shown
86
√ d
T (ϕ̂ − ϕ) → N 0, 1 − ϕ2
or
A 1
1 − ϕ2
ϕ̂ ∼ N ϕ,
T
A
and it follows that tϕ=1 ∼ N (0, 1). However, under the null hypothesis of nonstationarity the above
result gives
A
ϕ̂ ∼ N (1, 0)
which clearly does not make any sense.
test, both of which are constructed using GLS detrended data in order to increase the power perfor-
mance of the tests.
• Ng and Perron (2001) used the same strategy applied to the family of M tests, as well to a feasible
optimal point test denominated M PTGLS .
yt = ρyt−1 + ut
where yt is the variable of interest, t is the time index, ρ is a coefficient, and ut is the error term
(assumed to be white noise).
• A unit root is present if ρ = 1. The model would be non-stationary in this case.
• We can rewrite the model such that:
87
• Therefore to reject the null we just have to compare the the t statistic againts the critical values of the
Dickey-Fuller distribution.
• It is worth to point out that the critical values depend on the deterministic component of the DGP.
• There are three main versions of the test:
1. Test for a unit root:
∆yt = δyt−1 + ut
∆yt = a0 + δyt−1 + ut
3. Test for a unit root with constant and deterministic time trend:
∆yt = a0 + a1 t + δyt−1 + ut
• For testing about the unit root, we use the same critical values from the DF distribution.
• The t distribution can be used for testing about βi for i = 1, 1, 2, . . . , k
• However, a question remains....what is the value of k?
• Using Monte Carlo experiments, Schwert (1989) showed that the value of k has important implications
for the size and power of the ADF test, in particular when there is strong negative moving average
correlation in the residuals.
• Ng and Perron (1995) constitutes the first study dealing with the analysis of lag-length selection using
different criteria.
• They prove that the choice of the data-dependent rule has a bearing on the size and power of the test.
• Moreover, they show that information-based rules such as the Akaike information criteria (AIC) and
Bayesian information criteria (BIC) tend to select values of k that are consistently smaller than those
chosen through sequential testing for the significance of coefficients on additional lags (t-sig method),
and that the size distortions associated with the former methods are correspondingly larger.
88
• The Dickey Fuller and the Augmented Dickey fuller have been found to have low power (The power of
a test is the probability rejecting the null hypothesis when is false) in some circumstances.
– Consider a model where in ϕ = 0.95.
– By all accounts, it meets our criteria for a stationary process but the result of the test may indicate
non-stationarity especially in data with a low sample size.
• Based on Phillips and Perron (1988)
• The P P Test corrects for any serial correlation and heteroscedasticity in the errors by some direct
modification to the test statistics.
• This modification is a nonparametric one.
• The P P has no need to specify the lag length.
89
8.7 The M statistics
• Perron and NG (1996) showed that for a AR(1) and M A(1) with negative coefficient close to −1, the
P P test exhibits strong size distortions (it did not reject the null when the null was false).
• Thus, they proposed the M tests that was originally proposed by Stock (1999).
• The M statistics are composed of three statistics: M Zαb , M SB, M Ztαb .
• They performed better than the P P statistic.
• While the power gains of the DF GLS from using GLS detrended data are impressive, simulations also
show that the test exhibits strong size distortions when dealing with an M A(1) process with a negative
coefficient.
• Since the power gains from the DF GLS over the DF come from the use of GLS detrended data, it is
natural to consider the M tests under GLS detrending.
• Ng and Perron (2001) analasymptotic properties of the M GLS tests.
• Ng and Perron (2001) extend the M tests developed in Perron and Ng (1996) to allow for GLS
detrending of the data.
• They also show that both the use of the M IC and allowing for GLS data detrending in the M test
results in a class of M GLS tests that have desirable size and power properties.
• In conclusion, the M IC (in particular the M AIC version) is a superior rule for selecting lag length.
9 VAR (p)
9.1 Motivation
• So far we have focused mostly on models where y depends on past observations of y.
• More generally we might want to consider models for more than one variable.
• If we only care about forecasting one series but want to use information from another series we can
estimate an ARMA model and include additional explanatory variables.
• For example if yt is the series of interest, but we think xt might be useful we can estimate models like
yt = β0 + β1 yt−1 + γxt−1 + εt
• This model can be fit by least squares. Our dependent variable is yt and the independent variables are
yt−1 and xt−1
• Once the model is fit, the one-step ahead forecast is given by:
• Just like the simple AR model, the one step ahead forecast variance is σε2 .
• A joint model for xt and yt is required if we are interested in multiple step ahead forecasts, or if we
are interested in feedback effects from one process to the other.
• For example, if we want to 2 step ahead forecaste for yt , we are looking for
90
Then, the obvious question is what do we use for:
E (xt+1 | Ft )
since xt+1 is not known at t
• Answer: We need a model for x as well
• Before proceeding to the next section, it is important to review Appendix A.
• Then we can write the AR(p) model as the following first order model:
ξt = F ξt−1 + vt (23)
π̃(z) = z p π z −1
|λi | < 1
π(z)
|zi | > 1
91
• From the stacked form, we have that F is an square matrix and thus, we could find its eigenvalues:
|F − λI| = 0
|F − λI| = π̃(z)
, so to know if the stacked VAR(1) is stationary is the same as knowing if the eigenvalues of F are
inside the unit circle:
• Each equation is like an AR(1) model with one other explanatory variable.
• Each equation depends on its own lag and the lag of the other variable.
• We also now have two errors, one for each equation: uxt and uyt
• Since x depends on y and y depends on x, a more thorough understanding of dynamics and forecasting
requires us to jointly consider x and y in the system of equations.
• By defining the following vectors and matrices we end up with a very simple form for the VAR(1)
model. Let:
yt = β 0 + β 1 yt−1 + vt (26)
92
9.3.2 Assumptions on the errors
• The errors are white noise and uncorrelated with lags of the other errors.
• uxt is uncorrelated with uxt−j and uyt−j for j ̸= 0.
• uyt is uncorrelated with uxt−j and uyt−j for j ̸= 0.
• However there may be contemporaneously correlated. If so, we call them reduced shocks since they
come from a more compelling model that allows for a contemporaneous relationship between xt and
yt . It might be the case that the system is such that there is contemporaneous relationship:
Let’s, define:
αx
1
α=
αy 1
x
εt
εt =
εyt
Thus,
so,
vt = α−1 εt
If we assume that εt are structural shocks (i.e. the very first and purest shocks in the economy that
are not related to each other for any t), we have the the reduced shocks are such that there are
contemporaneously related:
σux uy = E[uxt uyt ] − E[uxt ]E[uyt ] = E[(εxt + αx εyt )(αy εxt + εyt )] = αx σε2x + αy σε2y ̸= 0
σu2 x
σux uy
Ω=
σux uy σu2 y
93
• Let’s use the lag operator:
yt = β 1 Lyt + vt
(I − β 1 L)yt = vt
yt = (I − β 1 L)−1 vt
• To do so, first we need to check if the inverse exist. That is, if (I − β 1 L) is a nonsingular matrix. So,
we need to check if:
1. (I − β 1 L) is a square matrix
2. |(I − β 1 L)| =
̸ 0 so we have:
1 ′
(I − β 1 L)−1 = C
|(I − β 1 L)|
′
, in which C is the transpose of the cofactor matrix
Condition 1 is satisfied given that β 1 is an square matrix.
• For condition 2, notice that |(I − β 1 L)| is a second order polynomial on L:
1 − β1x L β2x L
|(I − β 1 L)| =
β1y L 1 − β2y L
94
• We know that we can find the roots of the characteristic polynomial π(z), zi , and express it as:
1 1
π(z) = 1− z 1− z
z1 z2
• Further, we know that, the characteristic function (1 − ρz)−1 is regular (analytic) for all z ∈ C\L, in
which
Thus, from Taylor’s Theorem we have that π(z)−1 can be expressed as a power series with z0 ∈ C\L.
• The Taylor series expansion around zero is valid for |zi −1 z| < 1
• For |z| = 1, the Taylor series expansion around zero is valid if
|zi | > 1
Otherwise, the infinite sum does not converge to the function (1 − ρz)−1
• Thus, we need the roots of the characteristic polynomial |I − β 1 z| to be outside the unit circle.
• If so, we have that the solution for the system is given by:
−1
yt = (I − β 1 L) vt
1
yt = C ′ vt
|(I − β1 L)|
1
yt = C ′ vt
(1 − π1 L − π2 L2 )
−1
1 1
yt = 1− L 1− L C ′ vt
z1 z2
−1 −1
1 1
yt = 1 − L 1− L C ′ vt
z1 z2
∞ j ∞ j
X 1 X 1
yt = Lj Lj C ′ vt
j=0
z1 j=0
z2
∞
X
yt = ψj Lj C ′ vt
j=0
∞
1 − β2y L −β2x L
X
j
yt = ψj L vt
−β1y L 1 − β1x L
j=0
95
∞
X c1j c2j
yt = Lj vt
c3j c4j
j=0
∞
X
yt = Cj Lj vt
j=0
• So, we have an infinite MA representation for each variable with two errors. Using Theorem 6, we know
that the solution does not only converges in the second moment but it is also a stationary solution.
• Therefore, if the roots of the characteristic polynomial |I−β 1 z| are outside the unit circle, the stationary
solution of the system is given by:
∞
X
yt = Cj Lj vt (30)
j=0
|I − β 1 z| = π(z)
|Iz − β 1 | = π̃(z)
π̃(z) = z p π z −1
• Thus, the roots of π̃(z) are the inverse of the roots of the reflected polynomial π̃(z):
1
= λi
zi
• Recall that the roots of the (reflected) characteristic polynomial |Iλ − β 1 | are the eigenvalues of the
matrix β 1 .
• Thus, we have that talking about the roots of the characteristic polynomial |I − β 1 z| is the same as
talking about the inverse of the eigenvalues of β 1 .
|zi | > 1
• or if the eigenvalues of β 1 are inside the unit circle (i.e. the roots of the (reflected) characteristic
polynomial |Iλ − β 1 |):
|λi | < 1
96
Then, the VAR(1) has the following stationary solution:
∞
X
yt = Cj Lj vt (31)
j=0
• By defining the following vectors and matrices we end up with a very simple form for the VAR(2)
model. Let:
(I − β 1 L − β 2 L2 )yt = vt
• Let’s define:
A(L) = (I − β 1 L − β 2 L2 )
97
• So, we would like to be able to express the solution as:
yt = A(L)−1 vt
• To do so, first we need to check if the inverse exist. That is, if A(L) is a nonsingular matrix. So, we
need to check if:
1. A(L) is a square matrix
2. |(I − β 1 L − β 2 L2 )| =
̸ 0 so we have:
1 ′
(I − β 1 L − β 2 L2 )−1 = C
|(I − β 1 L − β 2 L2 )|
′
, in which C is the transpose of the cofactor matrix
Condition 1 is satisfied given that β 1 and β 2 are square matrices.
• For condition 2, notice that |(I − β 1 L − β 2 L2 )| is a fourth order polynomial on L:
• Further, we know that, the characteristic function (1 − ρ)−1 is regular (analytic) for all z ∈ C\L, in
which
Thus, from Taylor’s Theorem we have that π(z)−1 can be expressed as a power series with z0 ∈ C\L.
• The Taylor series expansion around zero is valid for |zi −1 z| < 1
98
• For |z| = 1, the Taylor series expansion around zero is valid if
|zi | > 1
Otherwise, the infinite sum does not converge to the function (1 − ρz)−1
• Thus, we need the roots of the characteristic polynomial |I − β 1 z| to be outside the unit circle.
• If so, we have that the solution for the system is given by:
yt = |A(L)|−1 C ′ vt
yt = π(L)−1 C ′ vt
∞ j ∞ j j j
X 1 X 1 1 1
yt = Lj Lj Lj Lj C ′ vt
j=0
z 1 j=0
z2 z3 z4
∞
X
yt = ψj Lj C ′ vt
j=0
∞
1 − β2y L − β4y L2 −β2x L + β4x L2
X
yt = ψj Lj vt
−β1y L + β3y L2 1 − β1x L − β3x L2
j=0
∞
X c1j c2j
yt = Lj vt
c3j c4j
j=0
∞
X
yt = Cj L j v t
j=0
• So, we have an infinite MA representation for each variable with two errors. Using Theorem 6, we know
that the solution does not only converges in the second moment but it is also a stationary solution.
• Therefore, if the roots of the characteristic polynomial |A(z)| are outside the unit circle, the stationary
solution of the system is given by:
∞
X
yt = Cj L j v t (36)
j=0
|zi | > 1
99
• or if the roots of the (reflected) characteristic polynomial (|Iλ2 − β 1 λ − β 2 |) are inside the unit circle:
|λi | < 1
Let’s define
A(L) = (I − β 1 L − β 2 L2 − β 33 − . . . β p Lp )
A(L)yt = vt
When solving the system, the difference is in the order of the characteristic polynomial |A(z)|. It is of order
(p + n) × (p + n).
|zi | > 1
• or if the roots of the (reflected) characteristic polynomial z p |A(z)| are inside the unit circle:
|λi | < 1
, in which Cj is an n × n matrix.
100
10 Structural Vector Autoregressions
10.1 Motivation
• A classic question in empirical macroeconomics:
– what is the effect of a policy intervention (interest rate increase, fiscal stimulus) on macroeconomic
aggregates of interest - output, inflation, etc?
• Let Yt be a vector of macro time series.
• Let εrt denote an unanticipated monetary policy intervention.
• We want to know the dynamic causal effect of εrt on Yt :
∂Yt+h
, h = 1, 2, 3, . . .
∂εrt
where the partial derivative holds all other interventions constant.
• In macro, this dynamic causal effect is called the impulse response function (IRF) of Yt to the
”shock” (unexpected intervention) εrt .
• The challenge is to estimate
∂Yt+h
∂εrt
from observational macro data.
• Two conceptual approaches to estimating dynamic causal effects (IRF):
– Structural model (Cowles Commission): DSGE or SVAR
– Quasi-Experiments
Yt = A1 Yt−1 + . . . + Ap Yt−p + ut
or
A(L)Yt = ut
where
A(L) = I − A1 L − A2 L2 − . . . − Ap Lp
where Ai are the coefficients from the (population) regression of Yt on Yt−1 , . . . , Yt−p .
• If ut were the shocks, then we could compute the structural IRF using the MA representation of the
VAR,
Yt = A(L)−1 ut
• But in general ut is affected by multiple shocks: in any given quarter, GDP changes unexpectedly for
a variety of reasons.
• Is there a way to identify the structural shocks?
• For that we need to find the relationship between the reduced VAR and structural VAR.
101
10.3 The Sctrucural VAR
• Consider a bivariate first-order VAR model:
yt = b10 − b12 xt + γ11 yt−1 + γ12 xt−1 + εyt
xt = b20 − b21 yt + γ21 yt−1 + γ22 xt−1 + εxt
• The error terms (structural shocks) εyt and εxt are white noise innovations with standard deviations
σy and σx and a zero covariance.
• From this respresentation we would be able to find the IRF for the structural shocks!
• However, we have an issue:
– The two variables y and x are endogenous
– Note that shock εyt affects y directly and x indirectly.
• It is worth to point out that here we there are 10 parameters to estimate.
102
10.6 Reduced form to structure
• The Reduced VAR
A(L)Yt = ut
Yt = A(L)−1 ut = C(L)ut
A(L) = I − A1 L − A2 L2 − . . . − Ap Lp
E[ut u′t ] = Σu (unrestricted)
• Because εt = Rut ,
RA(L)Yt = Rut = εt .
B(L)Yt = εt ,
Yt = D(L)εt
103
• Recall that B0 = R
10.6.2 Identification of R
• In population, we can know A(L).
• If we can identify R, we can obtain the SVAR coefficients,
B(L) = RA(L)
.
• The question here is how can we identify R:
– Identification by Short Run Restrictions: Sims (1980)
– Identification by Long Run Restrictions: Blanchard and Quah (1989)
– Identification from Heteroskedasticity: Rigobon (2003)
– Identification by Sign Restrictions: Uhlig (2005)
– Identification by External Instruments: Stock (2007), Stock and Watson (2012); Mertens and
Ravn (2013); Gertler and P. Karadi (2014); for IV in VAR (not full method) see Hamilton (2003),
Kilian (2009).
• The answer lies on the following equations:
Rut = εt
E [ut u′t ] = Σu
E [εt ε′t ] = Σε
Σε = RΣu R′
• Notice that Σu is identified, but the not identified parameters here are Σε and R.
RΣu R′ = Σε
104
– Now we can clearly see that we have to identify k parameters from Σε since Σε is a diagonal
matrix.
– Further, we also need to identify the parameters inside R, that is k × k parameters (remember
that in general R does not need to have the diagonal full of 1s.
– Therefore, we have k + k 2 parameters to identify.
• How many parameters do we have already identified?
– The ones on the RHS
– Recall that Σu is not a diagonal matrix because the reduced shocks are correlated.
– Given the structure of a variance covariance matrix, we do not have k × k identified parameters
because the lower triangle equals the upper triangle
– We only have k(k + 1)/2 identified parameters.
• How many parameters on the LHS can be identified?
– Given the above and that the quality holds:
PP′ = Σε
1
From Φ0 Σε2 = P, it follows that
1
Φ0 = PΣε − 2
– Given either of the normalizations, we end up with the condition that Φ0 must be lower triangular
(it is pretty evident if you normalize such that Σε = I).
– If Φ0 is lower triangular, then R−1 is lower triangular.
– Since R = B0 , it follows that B0 is lower triangular.
105
– Since B0 contains the coefficients associated with the contemporaneous relationship in the struc-
tural VAR, we are imposing a structure for the Short Run dynamics between the variables.
• If B0 is lower triangular we are imposing the idea that:
– yi,t for i = 2, 3, 4 . . . , k has no contemporaneous effect of y1,t .
– The residuals of u1,t are due to pure shocks to y1,t .
– yi,t for i = 3, 4 . . . , k has no contemporaneous effect of y2,t .
– yi,t for i = 4, 5 . . . , k has no contemporaneous effect of y3,t ... and so on.
– All the structural shocks have contemporaneous effect on the last variable, yk,t
• This description of identification is via method of moments, however identification can equally be
described via IV, e.g. see Blanchard and Watson (1986).
∞
X
yt = ψj Lj xt = ψ(L)xt
j=0
2
∞
X
−iw
sy (ω) = ψj e sx (ω)
j=0
2
sy (ω) = ψ(e−iwj ) sx (ω)
106
– Since the Spectral density is a Fourier Transform, then we can always apply the Fourier Inverse
to find each coefficient inside the infinite sum of the Fourier Transform, which in this case is γj
(i.e., each autocovariance of xt )
– The Fourier Inverse of the Spectral density of xt gives us:
Z π
1
γj = sx (ω)eiωj dω
2π −π
– Thus, from an AR(P) representation for yt we have that the Spectral Density of yt is given by:
2
∞
X
−iw
sy (ω) = ψj e sε (ω)
j=0
2
sy (ω) = ψ(e−iwj ) sε (ω)
, in which ω ∈ (−π, π] represents the frequency.
– That was for yt being univariate, but how it is if we have that Yt being multivariate?
– I will try to answer this intuitively.
– In the univariate case, we know that the Inverse Fourier of the spectral density of yt is the
autocovariance function of yt .
– If we are in the bi-variate case, we should expect that the Inverse Fourier of the Spectral Density
of Yt gives us the autocovariance function too.
– However, for this case we should get a matrix 2 × 2 as follows:
γy (h) γy,x (h)
=
γx,y (h) γx (h)
– Thus, it must be the case that the Spectral Density is also a matrix.
– Each element on the main diagonal is the autospectrum (i.e., the Fourier transform of the auto-
covariance function of each variable)
– While the elements on its off-diagonals are the cross-spectra between yi,t and yj,t , i ̸= j = 1, . . . , k
(i.e., the Fourier transform of the covariance function of yi,t and yj,t , i ̸= j = 1, . . . , k)
– For the Yt being multivariate, we have a modified version of the Theorem 8.
Theorem 9 Consider
Yt = A(L)−1 Xt
107
∗ Then, if |A(z)| converges on the unit circle, we have that Yt is stationary with its Spectral
density given by:
′
sy (ω) = A(e−iw )−1 sx (ω)A(e−iw )−1
′
sy (ω) = A(e−iw )−1 su (ω)A(e−iw )−1
Rut = εt
B(L) = RA(L)
Yt = B(L)εt
′
sy (ω) = B(e−iw )−1 sε (ω)B(e−iw )−1
– Now, remember we are interested in the long run variance, which is the spectral density at zero
frequency:
′
sy (0) = A(1)−1 su (0)A(1)−1
′
sy (0) = B(1)−1 sε (0)B(1)−1
– Given that the structural shocks are not correlated and that they are i.i.d. (the autocovariances
are zero), we have that:
sε (0) = Σε
′
sy (0) = A(1)−1 su (0)A(1)−1
′
sy (0) = B(1)−1 Σε B(1)−1
Thus,
′ ′
A(1)−1 su (0)A(1)−1 = B(1)−1 Σε B(1)−1
108
′ ′
A(1)−1 su (0)A(1)−1 = (RA(1))−1 Σε (RA(1))−1
sy (0) = D(1)D(1)′
109
2. The structural shock variance breaks at date s : Σε,1 before, Σε,2 after.
3. R doesn’t change between variance regimes.
• Let’s normalize R to have 1’s on the diagonal.
• Thus the unknowns are:
– in R we would have had k × k unknown parameters, but given the normalization of 1’s on the
diagonal, we only have : k 2 − k unknown parameters
– Σε,1 is a diagonal matrix, so we have k unknown parameters
– Σε,2 is a diagonal matrix, so we have k unknown parameters
• Recall that Rut = εt , so by looking at the variance, we have that:
First period: RΣu,1 R′ = Σε,1
Second period: RΣu,2 R′ = Σε,2
We can rewrite the above equations such that the identified parameters are on the LHS and the
unidentified ones are on the RHS:
−1
First period: Σu,1 = R−1 Σε,1 R′
−1
Second period: Σu,2 = R−1 Σε,2 R′
Thus, we have:
– For the first period, on the LHS we have k(k + 1)/2 identified parameters
– For the second period, we have k 2 − k k(k + 1)/2 identified parameters.
– On the RHS from both periods we have K 2 − k unidentified parameters for R, K unidentified
parameters for Σε,1 , and k unidentified paramtersfor Σε,2
– In total, we have k(k + 1) identified parameters, and k 2 + k unidentified parameters.
• There is a rank condition here too - for example, identification will not be achieved if Σε,1 and Σε,2
are proportional.
• The break date need not be known as long as it can be estimated consistently
• Different intuition: suppose only one structural shock is homoskedastic. Then find the linear combi-
nation without any heteroskedasticity!
11 Cointegrated VAR
11.1 Motivation
• Many economic variables are not stationary, and we consider the type of non-stationarity that can be
removed by differencing.
• Let’s think about the reduced VAR in first differences.
• Is it always the right approach if all the variable are I(1)?
• What if the variables share a long-run relationship?
• That is, a relationship that is stable in the long run.
• Wouldn’t it make sense that this stable relationship should be a regressor in the VAR in first differences?
• In the short-run dynamics, the movements of the variables might be guided by a long-run relationship.
110
11.2 I(0) process
• Let in the following ϵt be a sequence of independent identically distributed p -dimensional random
variables with mean zero and variance matrix Ω.
• A linear process defined by
∞
X
Yt = Cj εt−j
j=0
is I(0) if:
P∞ j
1. j=0 Cj z is convergent for |z| ≤ 1
P∞
2. i=j Cj ̸= 0
∆d Xt
is I(0). An important aspect of this definition is that it is enough to have one of the infinite sum of the errors
not converging for one element of Xt thus allowing the component processes to be integrated of different
orders. Remember that in general we can have p infinite sums for each element in Xt
β ′ Xt
11.4.1 Example
• Consider the following process:
Pt
X1t = i=1 ε1i + ε2t
Pt
X2t = a i=1 ε1i + ε3t
X3t = ε4t
• Clearly X3t is I(0) but the vector process Xt = (X1 , X2t , X3t ) is an I(1) process since the other two
elements have infinite sums that do not converge (both have one unit roots)
111
• It has two cointegrating vectors:
β1 = (a, −1, 0)
β2 = (0, 0, 1)
To see this, notice that:
t
X t
X
β1 Xt = a ε1i + aε2t − a ε1i − ε3t + 0
i=1 i=1
β1 Xt = aε2t − ε3t
, which is stationary.
112
11.6 Cointegrated VAR(1)
To get some intuition of the Error Correction Model, we will start by analyzing the simple case of a cointe-
grated VAR(1)
yt = β 1 yt−1 + vt
• We know, that if the roots of the characteristic polynomial |I − β 1 z| are outside the unit circle, then
the process has a stationary solution:
∞
X c1j c2j
yt = Lj vt
c3j c4j
j=0
yt = β 1 yt−1 + vt
A(L)yt = vt
1
yt = C ′ vt
(1 − π1 L − π2 L2 )
−1 −1
1 1
yt = 1 − L 1− L C ′ vt
z1 z2
Let’s suppose that z1 = 1, so we have one unit root. Then we have that:
−1
−1 1
yt = (1 − L) 1− L C ′ vt
z2
−1
1
(1 − L)yt = 1− L C ′ vt
z2
−1
1
∆yt = 1− L C ′ vt
z2
113
Given that |z2 | > 1, the RHS do converges in the second moment. Thus, we have that
∞
X
∆yt = C̃j vt−j
j=0
11.6.2 Is it cointegrated?
• Let’s assume that there is a vector β such that β ′ yt is stationary, then yt is cointegrated
A = CF
∆yt = (β 1 − I)yt−1 + vt
Let’s define:
Π = −(I − β 1 )
114
Then, we have that:
• From section 9.4.1, we know that ∆yt is stationary and by definition vt is stationary.
• We also know that yt−1 is I(1)
• Thus, for (40) to be true, it must be that either:
– Πyt−1 is stationary
– or Π = 0
• Given the dimensions of β 1 , Π is a 2 × 2 matrix
• We would like to immediately think that Πyt−1 is the cointegrated relationship, but not quite.
• Let’s use the rank decomposition of Π:
Π = αβ ′
∆yt = Πyt−1 + vt
Given that Π−1 is a finite constant matrix, we still have that Π−1 ∆yt and Π−1 vt are stationary.
Therefore, the only way in which the above equation can be true is if yt−1 is stationary, which is a
contradiction given that it is I(1). Thus:
0≤r<n
• So, the rank of the matrix Π is what we should try to find if we are looking for the number of
cointegrated vectors
• If yt is I(1), we can always express the Error correction model as follows:
, in which Π = αβ ′ , α is 2 × r that captures the adjustment vectors (to the long-run equilibrium), β is
2 × r that captures the cointegrated vectors (long-run equilibrium), and r being the rank of the matrix
Π.
115
11.9 Cointegrated VAR(p)
11.9.1 VECM representation
• Let’s assume the process yt is I(1) with n elements
• Thus, ∆yt is I(0)
• Either if there are or not cointegrating vectors, the VECM representation should be valid as the rank
of Π tells us the number of cointegrating relationships
• The idea is similar to the one in section 9.6 but with some extra tricks
• The goal is to get a system of equations in first differences such the there exists an explicit Π matrix
such that its rank tells us the number of cointegrating relationships
• From the VAR(p), let’s subtract yt−1 on both sides so we get the first difference of the process on the
LHS:
We would like to argue the same logic as in VAR(1), but it is not possible since the above equation
has the variables in levels, which are I(1), and we need all the variables in the VECM to be I(0).
• TRICK: To fix the problem mentioned above, we proceed to use the following trick:
yt−1 = yt − ∆yt
Thus, we proceed to transform all the variables in levels such that they can be expressed in first
differences:
Thus, we have some first differences of the variables, but we still have some variables in levels. Thus,
we have to continue the replacing until the only variable in levels is yt−1 9
• By doing so, we end up with the VECM representation of a VAR(p):
p
X
Γj = − βi , j = 1, . . . , p − 1
i=j+1
Π = − I − β1 − . . . − βp
9 e.g., yt−3 = yt−2 − ∆yt−2 = yt−1 − ∆yt−1 − ∆yt−2
116
• In the VECM representation, we have that the variable on the LHS is I(0). Thus, for (42) to be true,
we require all the elements on the RHS to be I(0), so Πyt−1 must be such that it contains all the
cointegrating vectors.
• So, using rank factorization, we have that:
Π = αβ ′
0≤r<n
since if it is full rank, all the variables in levels would be I(0) (see explanation in section 9.6). If the
rank is zero, then Π is a zero matrix and the variables are not cointegrated.
Π = − I − β1 − . . . − βp
A(L) = I − β 1 L − β 2 L2 − . . . − β p Lp
So, there is a relationship between the lag polynomial from the VAR(p) in levels and the Π matrix in
the VECM representation:
Π = −A(1) (43)
• So finding the rank of Π is finding the number of eigenvalues of A(z) for z = 1 that are nonzero:
• Furthermore, from the VAR(p) analysis in the previous section, we know that the first condition when
solving the system is that A(L)−1 exists. For ensuring that, we require the |A(L)| =
̸ 0.
• Given the explicit form for A(L), we know that |A(L)| is a polynomial of degree (p + n) × (p + n) in L.
• So, for A(L)−1 to exist we need the characteristic polynomial |A(z)| to not be equal to zero:
117
• Using the Fundamental Theorem of Algebra, we know, that we can express the above polynomial
in terms of its roots, zi :
1 1 1
|A(z)| = π(z) = 1− z 1 − z ... 1 − z ̸= 0 (46)
z1 z2 zp×n
• We know that Taylor series expansion is valid for each element in the multiplication for |z| ≤ 1 if
|zi | > 1
• Now, notice that z = 1 is just one particular case such that |z| = 1, but it is relevant in this case since
even for this trivial case, for the stationary solution to exist we need |zi | > 1.
• Now, remember that −A(1) = Π, so:
| − A(1)| = |Π|
and given that Π cannot be full rank (i.e. Π−1 does not exist, the matrix is singular) if the variables
are I(1) (see section 9.7.1). So it must be that
|Π| = 0
|A(1)| = 0
zi = 1
• Thus, having at least one unit root in the characteristic polynomial |A(z)| ensures that Π is not a full
rank matrix and the VECM representation is valid (i.e., the rank of Π can be zero but it is not full
rank so the variables are I(1)).
• Finally if at least one root in the characteristic polynomial |A(z)| is such that
|zi | < 1
−1
1
then, we have that for that root zi , the Taylor expansion for 1 − zi z is not valid for |z| = 1. Thus,
the characteristic polynomial |A(z)| cannot be expanded as Taylor Series expansion because we have
that the infinite sum for |z| = 1 does not converge (i.e. ∞).
118
• Notice that having |zi | < 1 for the characteristic polynomial |A(z)| does not make |A(1)| = 0, but we
would have that:
Π−1 = |Π|−1 C ′
Π−1 = |A(1)|−1 C ′
in which C ′ is the transpose of the cofactor matrix. Now, for the root such that |zi | < 1, we have that
−1
Taylor Series expansion is not valid for 1 − z1i z for |z| = 1. Thus, |A(z)|−1 is not convergent for
|z| = 1:
, in which each element is expanded using Taylor’s Theorem so we get an infinite sum for each element.
However, for |z| = 1, the infinite sum linked with the explosive root, |zi | < 1, is not convergent (i.e. it
is infinity). Thus, for |z| = 1 which includes z = 1:
|A(1)|−1 = ∞
Π−1 = ∞C ′ = ∞
zb ≤ 1
, we have that Π−1 does not exist, so Π is a singular matrix and it does not have full rank:
r<n
• Recall that X is I(1) if it has 1 unit root, I(2) if it has 2 unit roots. So the above case does not fall
into the category of Integrated process because it has an explosive root but not a unit root.
119
• Recall, Yt is cointegrated with 0 < r < n cointegrating vectors if there exists an (r × n) matrix B′
such that ′
β 1 Yt u1t
B′ Yt = .. ..
= . ∼ I(0)
.
β ′r Yt urt
120
– Phillips and Ouliaris (1990) show that ADF and PP unit root tests applied to the estimated
cointegrating residual do not have the usual Dickey-Fuller distributions under the null hy-
pothesis of no-cointegration.
– Due to the spurious regression phenomenon under the null hypothesis, the distribution of the
ADF and PP unit root tests have asymptotic distributions that depend on:
∗ The deterministic terms in the regression used to estimate β 2
∗ The number of variables, n − 1, in Y2t
– Hansen (1992):
∗ The asymptotic distributions of standard cointegration test statistics are shown to depend
both upon regressor trends and estimation detrending methods.
∗ It is suggested that trends be excluded in the levels regression for maximal efficiency.
∗ Fully modified test statistics are asymptotically chi-square.
• β̂ 2,DOLS is consistent, asymptotically normally distributed and efficient (equivalent to MLE) under
certain assumptions (see Stock and Watson (1993))
121
• This creates a nested set of models
H(0) ⊂ · · · ⊂ H(r) ⊂ · · · ⊂ H(n)
H(0) = non-cointegrated VAR
H(n) = stationary VAR(p)
• This nested formulation is convenient for developing a sequential procedure to test for the number r
of cointegrating relationships.
• Johansen formulates likelihood ratio (LR) statistics for the number of cointegrating relationships as
LR statistics for determining the rank of Π.
• Recall, the rank of Π is equal to the number of non-zero eigenvalues of Π.
• Thus, these LR tests are based on the estimated eigenvalues λ̂1 > λ̂2 > · · · > λ̂n of the matrix Π.
• Johansen derived two statistic tests for the number of cointegrating vectors:
1. Trace Statistic
2. Maximum Eigenvalue Statistic
• The asymptotic null distribution of LRtrace (r0 ) is not chi-square but instead is a multivariate version
of the Dickey-Fuller unit root distribution which depends on the dimension n − r0 and the specification
of the deterministic terms.
• Sequential Procedure for Determining the Number of Cointegrating Vectors:
– First test H0 (r0 = 0) against H1 (r0 > 0).
– If this null is not rejected then it is concluded that there are no cointegrating vectors among the
n variables in Yt .
– If H0 (r0 = 0) is rejected then it is concluded that there is at least one cointegrating vector and
proceed to test H0 (r0 = 1) against H1 (r0 > 1).
– If this null is not rejected then it is concluded that there is only one cointegrating vector.
– If the H0 (r0 = 1) is rejected then it is concluded that there is at least two cointegrating vectors.
– The sequential procedure is continued until the null is not rejected.
122
12.3.3 Johansen’s Maximum Eigenvalue Statistic
• Johansen also derives a LR statistic for the hypotheses
• As with the trace statistic, the asymptotic null distribution of LRmax (r0 ) is not chi-square but instead is
a complicated function which depends on the dimension n−r0 and the specification of the deterministic
terms.
Yt = ΠXt + ΓZt + εt , t = 1, . . . , T
10 Instatistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information
from cross-covariance matrices. If we have two vectors X = (X1 , . . . , Xn ) and Y = (Y1 , . . . , Ym ) of random variables, and there
are correlations among the variables, then canonical correlation analysis will find linear combinations of X and Y which have
maximum correlation with each other.
123
– Y is n × 1
– X is n × 1
– z is of dimension k
– The hypothesis that Π has reduced rank less than or equal to r is expressed as
Π = αβ ′
– α is n × r
– β is n × r
– r<n
• Reduced rank regression algorithm:
– In order to describe the algorithm, we introduce the notation for product moments
T
X
−1
Syx = T Yt Xt′
t=1
−1
Syx.z = Syx − Syz Szz Szx
−1
Sxx.z V Λ = Sxy.z Sxx.z Sxy.z V
−1
Λ and V are known as the generalized eigenvalues and eigenvectors of Sxy.z Sxx.z Sxy.z with
respect to Sxx.z .
124
4. Recall that the factorization ′
Π̂mle = α̂mle β̂ mle
is not unique. Thus, V is normalized so that
V ′ Sxx.z V = Ip
and
V ′ Syx.z Sxx.z
−1
Sxy.z V = Λ
α̂ = Syx.z β̂
−1
Ω̂ = Syy.z − Syx.z β̂ β̂ ′ Sxx.z β̂ β̂ ′ Sxy.z
– The normalization of the MLE for β to β̂ c,mle will affect the MLE of α but not the MLEs of the
other parameters in the VECM.
– β̂ c,mle is super consistent
– Let β̂ c,mle denote the MLE of the normalized cointegrating matrix β c . Johansen (1995) showed
that
T vec β̂ c,mle − vec (β c )
125