0% found this document useful (0 votes)
45 views

Material de Trabajo Topics in Time Series Econometrics 3

Uploaded by

erick Salgado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Material de Trabajo Topics in Time Series Econometrics 3

Uploaded by

erick Salgado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 125

University Extension Course - Summer 2024

Time Series Econometrics1


Prof.: Dr. Ricardo Quineche Uribe2
T.A. Rita Huarancca Delgado3

Contents
1 Complex Numbers 5
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Polar form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Exponential Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Modulus (Absolute value) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Unit circle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Convergence of Series of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Complex Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.8 The Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.8.1 The Derivative function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.8.2 The derivative at a point, z0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.9 Complex-differentiable function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.9.1 What if the function is not continuous? . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.10 Regular Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.11 Convergence of a Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.11.1 Ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.11.2 Root test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.12 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.13 Complex Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.13.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.13.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.13.3 Pseudo-variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.13.4 Covariance and Complementary Covariance . . . . . . . . . . . . . . . . . . . . . . . . 22
1.13.5 Uncorrelatedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.13.6 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2 Complex functions 22
2.1 The natural logarithm of a complex number . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.1 Principal Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.2 When is the function continuous? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.3 Complex differentiability of natural logarithm . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 The natural exponential of a complex number . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.1 Complex differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 The complex valued function: (1 − αz)−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1 This content cannot be shared, uploaded, or distributed. This content has not been written for publication.
2 The contact email address is [email protected]
3 The contact email address is [email protected]
2.3.1 Complex differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 Taylor Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.3 Convergence of the Taylor Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Lag polynomial 36

4 Convergence of Random Variables 37


4.1 Convergence of a Sequence of Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Types of Convergence for Sequences of Random Variables . . . . . . . . . . . . . . . . . . . . 38
4.3 Convergence in Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.1 Some intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.2 Markov’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Convergence in probability of a Infinite Sum of Random Variables: Absolutely Summable . . 41
4.5 Convergence in q th moment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5.1 Convergence in 1st moment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5.2 Convergence in 2nd moment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.6 Convergence in second moment of a Infinite Sum of Random Variables: Absolutely Summable 43
4.6.1 Usefulness of this exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.6.2 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.7 Convergence in second moment of an Infinite Sum of Random Variables: Square Summable . 49
4.7.1 Usefulness of this exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.7.2 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 Stationary Time Series 50


5.1 Strict Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Covariance Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3 The Autocovariance and Autocorrelation Function . . . . . . . . . . . . . . . . . . . . . . . . 51
5.4 Complex-Valued Stationary Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.5 The Complex-Valued Autocovariance Function . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.6 Stationarity of an Infinite sum of random variables (Real-valued/complex-valued) . . . . . . . 53
5.6.1 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6 Spectral Representation 54
6.1 Frequency Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.2 Frequency Domain versus Time Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.3 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4 Fourier Transform - Some Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4.1 Winding the original series around a circle . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4.2 The winding frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.4.3 The center of mass of the winding graph . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.4.4 The center of mass as a function of the winding frequency . . . . . . . . . . . . . . . . 56
6.4.5 Winding the original series around a circle on the Complex Plane . . . . . . . . . . . . 57
6.5 Fourier Transform - Formal Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.6 Fourier Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.6.1 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.7 Alternative Conventions for formal definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
1
6.7.1 Putting the the factor of 2π in the Fourier transform instead of in its inverse . . . . . 66
1
6.7.2 Splitting the factor of 2π evenly between the Fourier transform and its inverse . . . . 66
6.8 Lag Operator Calculus and Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.8.1 Case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.8.2 Case 2: AR(m) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.8.3 Case 3: MA(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

2
6.9 Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.9.1 Some Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.9.2 Formal Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.9.3 Approximation of the spectral density by the second moment of the Fourier transform 70
6.9.4 Finding the autocovariance: Fourier Inverse . . . . . . . . . . . . . . . . . . . . . . . . 71
6.9.5 Lag Operator Calculus, Stationarity and the Spectral density . . . . . . . . . . . . . . 73

7 AR(P) 77
7.1 The lag polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.2 Solving the difference equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.3 Is the inverse of the lag polynomial well defined? . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.3.1 The roots of the characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.3.2 An alternative characteristic polynomial: The reflected polynomial . . . . . . . . . . . 78
7.4 Is the function a regular one? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.4.1 Inverting π(L) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.4.2 Is the characteristic polynomial π(z)−1 analytic on the unit circle, |z| = 1? . . . . . . 82
7.5 Is the process stationary? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.6 The Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.7 Impulse response Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8 Testing for the presence of Unit Root 86


8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.2 The most common unit root tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.3 The Dickey-Fuller (DF) Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.4 The Augmented Dickey Fuller - ADF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.5 The Phillips Perron Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.6 The ADF − GLS test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.7 The M statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

9 VAR (p) 90
9.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
9.2 Companion Form Representation of an AR(p) Model . . . . . . . . . . . . . . . . . . . . . . . 91
9.2.1 Stationarity of the Stacked VAR(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.3 The VAR(1) model with 2 variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9.3.1 Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9.3.2 Assumptions on the errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
9.3.3 Solving the system of equations: Inverting the matrix . . . . . . . . . . . . . . . . . . 93
9.3.4 The roots of the characteristic polynomial and the eigenvalues . . . . . . . . . . . . . 96
9.3.5 Stationary Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
9.4 VAR(1) with n variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
9.5 The VAR(2) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
9.5.1 Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
9.5.2 Solving the system of equations: Inverting the matrix . . . . . . . . . . . . . . . . . . 97
9.5.3 Stationary Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.6 The VAR(p) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.6.1 Stationary Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

10 Structural Vector Autoregressions 101


10.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10.2 The Reduced VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10.3 The Sctrucural VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
10.4 From a SVAR to an RVAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3
10.5 The Identification Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
10.6 Reduced form to structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
10.6.1 A note about B0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
10.6.2 Identification of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
10.7 Identification by Short Run Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
10.8 Identification by Long Run Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
10.9 Identification from Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

11 Cointegrated VAR 110


11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
11.2 I(0) process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
11.3 I(d) process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
11.4 Cointegrated process with cointegrating vector β . . . . . . . . . . . . . . . . . . . . . . . . . 111
11.4.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
11.5 Spurious Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
11.6 Cointegrated VAR(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
11.6.1 Finding the integrated order of the process . . . . . . . . . . . . . . . . . . . . . . . . 113
11.6.2 Is it cointegrated? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
11.7 Rank decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
11.8 The Vector Error Correction model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
11.9 Cointegrated VAR(p) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
11.9.1 VECM representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
11.10The rank of Π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

12 Estimatoin of the Cointegrated VAR(P) 119


12.1 Residual-Based Tests for Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
12.2 Dynamic OLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
12.3 Johansen’s Methodology for Modeling Cointegration . . . . . . . . . . . . . . . . . . . . . . . 121
12.3.1 Likelihood Ratio Tests for the Number of Cointegrating Vectors . . . . . . . . . . . . 121
12.3.2 Johansen’s Trace Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
12.3.3 Johansen’s Maximum Eigenvalue Statistic . . . . . . . . . . . . . . . . . . . . . . . . . 123
12.3.4 Estimation of the Cointegrated VECM . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
12.3.5 Reduced Rank Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
12.3.6 Hypothesis Testing about the coefficients . . . . . . . . . . . . . . . . . . . . . . . . . 125

13 Further Research on Cointegration 126

Appendices 127

Appendix A Some matrix algebra facts 127


A.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
A.2 Eigenvalues and the characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
A.3 Eigen Decomposition ”Matrix Diagonalization” . . . . . . . . . . . . . . . . . . . . . . . . . . 129
A.4 Features of Eigen Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
A.5 Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
A.5.1 Example of a variance-covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 131

4
1 Complex Numbers
1.1 Introduction
z = x + iy (1)

• i= −1
• Re(z) = x: denotes the real parts of z
• lm(z) = y: denotes the imaginary parts of z
• The absolute value, or modulus of a complex number is denoted by:

p
|z| = x2 + y 2

Notice that the modulus of a complex number is always a real number and in fact it will never be
negative.
• The complex conjugate of a complex number is the number with an equal real part and an imaginary
part equal in magnitude, but opposite in sign.

z̄ = x − iy

An important identity:

z z̄ = |z|2

Thus, the product of a complex number and its conjugate is a real number.
• The modulus of a complex conjugate of a complex number is the same as the modulus of the
original complex number:

|z| = |z̄|

p p
x2 + y 2 = x2 + (−y)2

1.2 Polar form


• We can think of complex numbers as vectors as in Figure 1
• r denotes the length of the vector
• r is the hypotenuse
• φ is the angle from the positive real axis to the vector representing z
• φ is usually measured in the standard unit called “radians”.
• The numeric value is given by the angle in radians, and is positive if measured counterclockwise, and
negative if measured clockwise
• −π < φ ≤ π

5
Figure 1: Complex plane

• φ is also an element of θ, which is the argument of a complex number and it is defined as the
angle inclined from the real axis in the direction of the complex number represented on the complex
plane
θ = arg z

Figure 2: arg(z)

• It can be seen that the argument of any non-zero complex number has many possible values: firstly, as
a geometrical angle, it is clear that whole circle rotations do not change the point, so angles differing
by an integer multiple of 2π radians (a complete circle) are the same, as reflected by Figure 2
• Because a complete rotation around the origin leaves a complex number unchanged, there are many
choices which could be made for θ by circling the origin any number of times
• Thus, arg z is a multi-valued (set-valued) function in the sense that there are a set of values which
includes φ

6
• The argument of z can be any of the infinite possible values of θ each of which can be found by solving
y
tan θ =
x
Thus,
y
θ = tan−1
x
y
θ = arctan
x
The inverse tangent is a multivalued function
• Thus, whenever we see arg z, we know that the function is not unique
• We would like to see φ in a function instead of arg z
• When a well-defined function is required, then the usual choice, known as the principal value, Arg(z):

Arg(z) = φ = arg(z) − 2πn | n ∈ Z ∧ −π < arg(z) − 2πn ≤ π

This represents an angle of up to half a complete circle from the positive real axis in either direction.
• Likewise, the argument of z in terms of the principal value is

arg(z) ∈ {Arg(z) + 2πn | n ∈ Z}

• x is the adjacent
• iy is the opposite
• Our aim in this section is to write complex numbers in terms of a distance from the origin and a
direction (or angle) from the positive horizontal axis.
• From Pythagoras, we have:

r2 = x2 + (y)2

p
r= x2 + (y)2

• r is the absolute value (or modulus) of the complex number


• Let’s forget about the i for a moment. Basic trigonometry gives us:
Sine
y opposite
sin φ = =
r hypotenuse
Cosine
x adjacent
cos φ = =
r hypotenuse
Tangent
y opposite
tan φ = =
x adjacent
Cosecant
r hypotenuse
csc φ = =
y opposite

7
Secant
r hypotenuse
sec φ = =
x adjacent
Cotangent
x adjacent
cot φ = =
y opposite

• Thus, we have:
y
tan φ =
x

x = r cos φ

y = r sin φ

• Multiplying the last expression throughout by i gives us:

yi = ir sin φ

• So we can write the polar form of a complex number as:

z = x + yi = r(cos φ + i sin φ) (2)

z = x + yi = |z|(cos φ + i sin φ) (3)


But, this also holds for θ. So a polar representation in terms of θ is also valid:

z = x + yi = r(cos θ + i sin θ)

• So, it is a parametric representation of a circle of radius r

1.3 Exponential Form


• Actually, it is an alternative way of writing the polar form
• From Euler’s formula, for any real number θ:

eiθ = cos θ + i sin θ (4)

where e is the base of the natural logarithm, i is the imaginary unit


• Euler’s formula, named after Leonhard Euler, is a mathematical formula in complex analysis that
establishes the fundamental relationship between the trigonometric functions and the complex ex-
ponential function.
• As θ is a real number, from (2), we have that the exponential form of a complex number,z with
modulus r and arg z = θ is given by:
z = reiθ (5)
– where θ = arg z and so we can see that, much like the polar form, there are an infinite number of
possible exponential forms for a given complex number
– Thus it is not unique unless we limit θ to the principal value

8
– Also, because any two arguments for a given complex number differ by an integer multiple of 2π,
which is the measure in radians of a complete circle, we will sometimes write the exponential form
as:

z = rei(φ+2πn) n = 0, ±1, ±2, . . .

where φ is the principal value of the argument

1.4 Modulus (Absolute value)


• Let’s start by defining the absolute value function for a real number. For any real number x, the
absolute value or modulus of x is denoted by |x| is defined as:
(
x, if x ≥ 0
|x| =
−x, if x < 0

|x| = x2
– when x itself is negative (x < 0), then its absolute value is necessarily positive |x| = −x > 0.
Since x is negative we get −1 times a negative, which gives me a positive number
– The absolute value of x is thus always either positive or zero, but never negative
– In the second representation, the square root symbol represents the unique positive square
root (when applied to a positive number)
• The real absolute value cannot be directly applied to complex numbers.
• However, the geometric interpretation of the absolute value of a real number as its distance from 0 can
be generalised.
• The absolute value of a complex number is defined by the Euclidean distance of its corresponding point
in the complex plane from the origin.
• This can be computed using the Pythagorean theorem. From Figure 1, it is easy to see that the
absolute value or modulus of z is:

p p
|z| = [Re(z)]2 + [Im(z)]2 = x2 + y 2 (6)
|z| = r

, notice that the square root symbol represents the unique positive square root (when applied
to a positive number), which means that the modulus of a complex number is always a real
positive number
• Notice that the absolute value (or modulus) of the complex number is r in the polar form
• To get the value of r we can use (5) and the modulus of both sides and then do a little simplification
as follows:

p p
|z| = reiθ = |r| eiθ = |r|| cos θ + i sin θ| = r2 + 0 cos2 θ + sin2 θ = r
since cos2 θ + sin2 θ = 1 by the fundamental Pythagorean identity.
• When the imaginary part y is zero, this coincides with the definition of the absolute value of the real
number x.

9
1.5 Unit circle
• The unit circle is the set of complex numbers whose modulus, |z|, is 1
• On the complex plane they form a circle centered at the origin with a radius of 1
• It includes the value :
– 1 on the right extreme
– i at the top extreme
– −1 at the left extreme
– i at the bottom extreme
• On its exponential form it can be written as:

z = eiθ

since the modulus of z is 1 (|z| = r = 1). Its conjugate:

z̄ = eiθ = cos θ − i sin θ = e−iθ

We can also find its modulus. Recall from the identity with its conjugate:

|z| = (z z̄)1/2


eiθ = eiθ e−iθ = 1

• The interior of the unit circle is called the open unit disk
• While the interior of the unit circle combined with the unit circle itself is called the closed unit disk.

1.6 Convergence of Series of Complex Numbers



• Sequence of complex numbers: {zi }i=1
P∞
• Infinite series of complex numbers: i=1 zi
P∞
• A series m=1 zm converges to A ∈ C if
n
X
lim zm = A
n→∞
m=1

that is, if
n
X
lim zm − A = 0
n→∞
m=1
P∞
• Theorem 1 If m=1 zm converges then

lim zm = 0,
m→∞
P∞
Thus, the element zm inside m=1 zm goes to zero as long as m goes to infinity.

10
P∞
• Cauchy’s Criterion. m=1 zm converges if and only if
n
X
zm → 0
m=k

as k, n → ∞
P∞ P∞
• Absolute Convergence. m=1 zm converges absolutely if m=1 |zm | converges.

1.7 Complex Polynomial


A complex polynomial of degree n is a function of the form

P (z) = a0 + a1 z + a2 z 2 + · · · + an z n
n
X
P (z) = ak z k
k=0

• n is a nonnegative integer, n ≥ 1,
• The ak are complex numbers not all zero
• z is a complex variable
• Any polynomial of degree n has precisely n roots
• A root is a value of z such that P (z) = 0
• The Fundamental Theorem of Algebra: For any polynomial of degree n, we can rewrite the
polynomial in terms of its roots, zi :

P (z) = an (z − z1 ) (z − z2 ) · · · (z − zn )

• Reflected (Reciprocal) polynomial:

P̃ (z) = an + an−1 z + an−2 z 2 + · · · + a0 z n

That is, the coefficients of P̃ (z) are the coefficients of P (z) in reverse order. Notice that:

P̃ (z) = an z n + an−1 z n−1 + an−2 z n−2 + · · · + a0 z n




Thus, we can rewrite the reflected polynomial as:

P̃ (z) = z n P z −1


• Let’s denote λi as the roots in the characteristic polynomial P̃ (z)


• We know that the both polynomials the original and the reflected one are related. So, are the roots
also related?

11
˜ we are finding the values of z such that:
• Indeed the roots are related. When finding the roots for P (z),

P̃ (z) = 0

, which is the same as finding the values of z such that:

z n P (z −1 ) = 0

Since the values for z cannot be zero, then we are finding the values of z such that:

P (z −1 ) = 0

Let’s denote zi as the roots of the original polynomial, P (z). Given this notation, the roots of P (z −1 )
should be zi−1 since P (z −1 ) is P (z) using the inverse variable. Therefore, we have that:

λi = zi−1 (7)

and that

P̃ (z) = a0 (z − λ1 ) (z − λ2 ) · · · (z − λn )

• Notice that:

P (z −1 ) = an z −1 − z1 z −1 − z2 · · · z −1 − zn
  

and that

P̃ (z −1 ) = a0 z −1 − λ1 z −1 − λ2 · · · z −1 − λn
  

• Further, notice that P (z) is also the reflected polynomial of P̃ (z):

P (z) = z n P̃ (z −1 )

Thus,

P (z) = z n a0 z −1 − λ1 z −1 − λ2 · · · z −1 − λn
  

P (z) = a0 z n z −1 − λ1 z −1 − λ2 · · · z −1 − λn
  

P (z) = a0 z −1 − λ1 z z −1 − λ2 z · · · z −1 − λn z
  

P (z) = a0 (1 − λ1 z) (1 − λ2 z) · · · (1 − λn z)

12
• We know that λi = zi−1 . Thus, we can also write:
     
1 1 1 1
P (z) = a0 1 − z 1− z 1 − z ... 1 − z (8)
z1 z2 z3 zp

P (z) = a0 1 − z1−1 z 1 − z2−1 z · · · 1 − zn−1 z


  

1.8 The Derivative


1.8.1 The Derivative function
We can define the derivative function of f (z) as:

f (z + ∆z) − f (z)
f ′ (z) = lim
∆z→0 ∆z
• Notice that the derivative is a limit, so for the derivative function to exist we need the limit to exist
• Knowing the set of values for which the function is not continuous is relevant since if the function is
not continuous at some values of z, then its derivative does not exist at those values.
• Recall that z = x + iy. Since i is a constant, a change on z can only be triggered by a change in x or
y. Thus:

∆z = ∆x + i∆y

• Since z = x + iy, f (z) can be rewritten such that:

f (z) ≡ f (x, y) = u(x, y) + iv(x, y)

So, u(.) and v(.) are functions of real numbers


• Thus, if we are only considering a change on the real axis, x, we have that:

f (z + ∆x) − f (z)
f ′ (z) = lim
∆x→0 ∆x

In terms of u and v:

u(x + ∆x, y) − u (x, y) iv(x + ∆x, y) − iv (x, y)


f ′ (z) = lim + lim
∆x→0 ∆x ∆x→0 ∆x

u(x + ∆x, y) − u (x, y) v(x + ∆x, y) − v (x, y)


f ′ (z) = lim + i lim
∆x→0 ∆x ∆x→0 ∆x

By using the definition of partial derivative for u and v, we have that

f ′ (z) = ux (x, y) + ivx (x, y)

13
• On the other hand, if we are only considering a change on the imaginary axis, y, we have that:

f (z + i∆y) − f (z)
f ′ (z) = lim
∆y→0 i∆y

, i∆y takes into account the fact that a change in y affects z in +i∆y. In terms of u and v we have
that the change in y can be isolated in the sense that we do not need to express it as +i∆y:

u(x, y + ∆y) − u (x, y) iv(x, y + ∆y) − iv (x, y)


f ′ (z) = lim + lim
∆y→0 i∆y i∆y→0 i∆y

u(x, y + ∆y) − u (x, y) i(v(x, y + ∆y) − v (x, y))


f ′ (z) = lim + lim
∆y→0 i∆y ∆y→0 i∆y

TRICK: Notice that iy is not an argument in u or v. We need the denominator to be ∆y so the limit
expression has the meaning of partial derivative. Thus, we proceed as follows:

i(u(x, y + ∆y) − u (x, y)) i(v(x, y + ∆y) − v (x, y))


f ′ (z) = lim + lim
∆y→0 i(i∆y) ∆y→0 i∆y

i(u(x, y + ∆y) − u (x, y)) v(x, y + ∆y) − v (x, y)


f ′ (z) = lim + lim
∆y→0 −1∆y ∆y→0 ∆y

−i(u(x, y + ∆y) − u (x, y)) v(x, y + ∆y) − v (x, y)


f ′ (z) = lim + lim
∆y→0 ∆y ∆y→0 ∆y

u(x, y + ∆y) − u (x, y) v(x, y + ∆y) − v (x, y)


f ′ (z) = −i lim + lim
∆y→0 ∆y ∆y→0 ∆y

By using the definition of partial derivative for u and v, we have that

f ′ (z) = −iuy (x, y) + vy (x, y)

• Thus we have two expressions for f ′ (z):

f ′ (z) = ux (x, y) + ivx (x, y)

f ′ (z) = −iuy (x, y) + vy (x, y)

Both can be used, however they can only be used if the derivative exists
• Those two expressions helps us to get the Cauchy-Riemann equations by equating the real part
and the imaginary part, respectively:

ux (x, y) = vy (x, y)

vx (x, y) = −uy (x, y)

If both equations hold, then the derivative exist.

14
• Example:
f (x + yi) = (x + yi)2 = x2 − y 2 + i2xy


Notice that,
u(x, y) = x2 − y 2 and v(x, y) = 2xy
Thus,
ux = 2x = vy
uy = −2y = −vx .
Further, the derivative of f (z) is clearly f ′ (z) = 2z, so:

f ′ (x + iy) = 2(x + iy) = 2x + i2y = ux + ivx = vy − iuy

1.8.2 The derivative at a point, z0


The derivative of f (z) at z0 is defined by4

f (z) − f (z0 )
f ′ (z0 ) = lim
z→z0 z − z0

provided this limit exists.


• where z0 is a complex number and z → z0 means |z − z0 | → 0.
• This limit must be the same no matter how |z − z0 | → 0.
• Since this is the complex plane, |z − z0 | → 0 is not only considering the values coming from the left
and from the right as in the definition of the existence of a limit on R. |z − z0 | is the modulus, so it is
considering all the values inside a circle with radius |z − z0 | on the complex plane
• Since it is possible to approach z0 from many directions on the complex plane, the differentiability
of functions of complex numbers is in a sense stricter than the differentiability of functions of real
numbers
• The function f (z) is said to be differentiable at z0 if its derivative at z0 exists.

1.9 Complex-differentiable function


Let z = x + iy and f (z) = u(x, y) + iv(x, y) on some region G containing the point z0 . If f (z) satisfies
the Cauchy-Riemann equations and has continuous first partial derivatives in the neighborhood of z0 , then
f ′ (z0 ) exists and is given by
f (z) − f (z0 )
f ′ (z0 ) = lim
z→z0 z − z0
and the function is said to be complex differentiable (or, equivalently, analytic or holomorphic).
Cauchy-Riemann equations: We usually have f (z), but we know that z = x + iy. Thus, the idea is to
rewrite f (z) such that:

f (z) ≡ f (x, y) ≡ u(x, y) + iv(x, y)


4 The difference between this formula and the previous one is that this one is evaluating the derivative function at z , while
0
the former is just the definition of a derivative function

15
If f is complex differentiable, then the Cauchy-Riemann equations must hold:
∂u ∂v
=
∂x ∂y
and
∂v ∂u
=−
∂x ∂y
Note: For using these equations, we usually take advantage of the polar or exponential representation of z.
However it is not so easy to do.

1.9.1 What if the function is not continuous?


It is convenient to analyze if the function is continuous at some points or not since:

Differentiability =⇒ Continuity, so ¬ Continuity =⇒ ¬ Differentiability.

This will play an important role when analyzing the complex natural logarithm function.
Example
• Let’s analyze the simple case y = 0, so z = x ∈ R
• Take for example the very simple function:

x+1 x≥0
f (x) =
x x<0

• It is discontinuous at x = 0
• The limit limx→0 f (x) is 0, which is not equal to the limit when x approaches to zero from the RHS
(positive numbers), 1.
• However if we apply the derivative formula like robots, we have that

f ′ (x) = 1

for any x, so we can wrongly conclude the function is differentiable at x = 0


• Recall that we require the derivative to exist at x = 0, so we should use the definition of derivative:

f (x) − f (0)
lim
x→0 x

To analyze if it exists we need to analyze if the the limit when approaching to zero from the positive
and negative numbers. Analyzing the limit when approaching to zero from the negative numbers:

f (x) − f (0) x−1


lim = lim =∞
x→0− x x→0 − x

So, the limit does not exist. Therefore, the derivative at x = 0 does not exists

16
1.10 Regular Function
• A function f of the complex number z is analytic at a point z0 if its derivative exists not only at z0
but also at each point z in some neighborhood of z0 .
• A regular (or holomorphic or analytic) function is defined as a complex-valued differentiable
function on an open set D of C. That is, a function is regular on a region D of C, if the complex-valued
function is complex differentiable at every point in the set D of C
• If a function f is analytic at a point, then its derivatives of all orders exist (infinitely differentiable)
and are themselves analytic there.
• The main result about a regular function is that at any point of the domain of definition, D, the
regular function can be expanded in a Taylor’s series that converges in the largest open disk that
does not contain any singularity.
• Another fundamental result is that for two regular functions the composition f (g(z)) is again
regular provided the range of g is in the domain of f
• If two functions f (z) and g(z) are analytic in a domain D, then their sum and their product are both
analytic in D.
f (z)
• The quotient g(z) is also analytic in D provided that g(z) ̸= 0 for any z in D.
• An entire function is a function that is analytic at each point in the entire complex plane
• Every polynomial is an entire function.
P (z)
• Hence the quotient Q(z) of two polynomials is analytic in any domain throughout which Q(z) ̸= 0.

1.11 Convergence of a Power Series


A power series in one variable is an infinite series of the form


X
f (z) = an (z − z0 )n = a0 + a1 (z − z0 )1 + a2 (z − z0 )2 + · · ·
n=0


X n
f (z) = an (z − z0 )
n=0

• an represents the complex coefficient of the nth term


• an is independent of z and may be expressed as a function of n (e.g., an = 1
n! )

• z0 is the the center of the series and is a complex constant


• If there is a number r > 0, then the series converges absolutely to an analytic function for |z − z0 | < r.
• The series diverges for |z − z0 | > r
• r is called the radius of convergence
• The disk |z − z0 | < r is called the disk of convergence
• The theorem doesn’t say what happens when |z − z0 | = r.
• If r = ∞ the function f (z) is entire.
• If r = 0 the series only converges at the point z = z0 . In this case, the series does not represent an
analytic function on any disk around z0 .

17
• The derivative is given by term-by-term differentiation

X n−1
f ′ (z) = nan (z − z0 )
n=0

The series for f ′ also has radius of convergence r


• How do we find r? Often (not always) we can find r using the Ratio test and the Root test. Both are
two standard tests from calculus on the convergence of infinite series.

1.11.1 Ratio test


Consider the series


X
cn
n=0

If L = limn→∞ |cn+1 /cn | exists, then:


• If L < 1 then the series converges absolutely
• If L > 1 then the series diverges.
• If L = 1 then the test gives no information.
Example
• Consider the geometric series

f (z) = 1 + z + z 2 + z 3 + . . . ..

• The limit of the absolute ratios of consecutive terms is


z n+1
L = lim = |z|
n→∞ |z n |

• Thus, L exists and for the series to converge, we need L < 1.


• Thus, the geometric series converges when |z| < 1
• Thus, the radius of convergence is 1
• |z − 0| < 1 is the disk of convergence
• We know this converges to 1/(1 − z)
• Note, the disk of convergence ends exactly at the singularity z = 1
Example
• Consider the geometric series


X zn
f (z) =
n=0
n!

18
• The limit of the absolute ratios of consecutive terms is
z n+1 /(n + 1)! |z|
L = lim = lim =0
n→∞ |z n | /n! n+1

• Thus, L exists and for the series to converge, we need L < 1.


• However, L = 0, so it always converges no mattering the value of z
• Since L < 1 this series converges for every z.
• the radius of convergence for this series is ∞. That is, f (z) is entire.

1.11.2 Root test


Consider the series


X
cn
0

1/n
If L = limn→∞ |cn | exists, then:
• If L < 1 then the series converges absolutely.
• If L > 1 then the series diverges.
• If L = 1 then the test gives no information.
Example
• Consider the geometric series

1 + z + z2 + z3 + . . . .

• The limit of the n th roots of the terms is


1/n
L = lim |z n | = lim |z| = |z|
n→∞

• Thus, the root test agrees that the geometric series converges when |z| < 1.

1.12 Taylor’s Theorem


Theorem 2 Let f (z) be an analytic everywhere inside a region G. Let z0 ∈ G. Then,


X n
f (z) = an (z − z0 )
n=0

where the series converges on any disk |z − z0 | < r contained in G. Furthermore, we have formulas for the
coefficients:
f ′ (z0 ) f ′′ (z0 ) 2
f (z) = f (z0 ) + (z − z0 ) + (z − z0 ) + · · ·
1! 2!

That is, the Taylor series of a real or complex-valued function f (z) that is infinitely differentiable (regular)
at a real or complex number z0 .

19
In the more compact sigma notation, this can be written as

X f (n) (a)
(x − a)n
n=0
n!

where f (n) (a) denotes the n th derivative of f evaluated at the point a. (The derivative of order zero of f is
defined to be f itself and (x − a)0 and 0! are both defined to be 1.)
• The special case of series when z0 = 0 is called the Maclaurin series.
• Notice that it holds with equality because it is not an approximation but an equality, but the function
must be infinity differentiable (which means that the function is regular). That is why we have an
infinite sum.
• If the function is not infinite differentiable we can have a approximation of order n
• nth Taylor polynomial: The partial sum formed by the n first terms of a Taylor series is a polynomial
of degree n. Taylor polynomials are approximations of a function, which become generally better
when n increases.

1.13 Complex Random Variable


• Complex random variables are a generalization of real-valued random variables to complex numbers
• Complex random variables can always be considered as pairs of real random variables: their real and
imaginary parts
• Therefore, the distribution of one complex random variable may be interpreted as the joint distribution
of two real random variables
• Some concepts of real random variables have a straightforward generalization to complex random
variables—e.g., the definition of the mean of a complex random variable. Other concepts are unique
to complex random variables

1.13.1 Expectation
• The expectation of a complex random variable is defined based on the definition of the expectation of
a real random variable:

E[z] = E[x] + iE[y]

• Note that the expectation of a complex random variable does not exist if E[x] or E[y] does not exist.
• if the complex random variable Z has a probability density function fZ (z), then the expectation is
given by
Z
E[z] = z · fz (z)dz
C

• If the complex random variable Z has a probability mass function pZ (z), then the expectation is given
by

X
E[z] = z · pz (z)
z

20
Properties
• Whenever the expectation of a complex random variable exists, taking the expectation and complex
conjugation commute:

E[z] = E[z̄]

• The expected value operator E[·] is linear in the sense that:

E[az + bw] = aE[z] + bE[w]

for any complex coefficients a, b even if z and w are not independent.

1.13.2 Variance
• The variance is defined as

Var[z] = E |z − E[z]|2 = E |z|2 − |E[z]|2


   

Properties
• The variance is always a nonnegative real number
• It is equal to the sum of the variances of the real and imaginary part of the complex random variable:

Var[z] = Var(x) + Var(y)

• The variance of a linear combination of complex random variables may be calculated using the following
formula:

" N
# N X
N
X X
Var ak Zk = ai aj Cov [Zi , Zj ]
k=1 i=1 j=1

for any complex coefficients ak

1.13.3 Pseudo-variance
• The pseudo-variance is a special case of the pseudo-covariance and is given by

E (z − E[z])2 = E z 2 − (E[z])2
   

• Unlike the variance of z, which is always real and positive, the pseudo-variance of z is in general
complex.

21
1.13.4 Covariance and Complementary Covariance
• The covariance between two complex random variables z and w is defined as:

Kzw = Cov[z, w] = E[(z − E[z])(w − E[w])] = E[z w̄] − E[z]E[w̄]

• Notice the complex conjugation of the second factor in the definition


• In contrast to real random variables, we also define a pseudo-covariance (also called complementary
variance):

Jzw = Cov[z, w̄] = E[(z − E[z])(w − E[w])] = E[zw] − E[z]E[w]

Properties

Cov[z, w] = Cov[w, z] (Conjugate symmetry)


Cov[αz, w] = α Cov[z, w] (Sesquilinearity)
Cov[z, αw] = ᾱ Cov[z, w]
Cov [z1 + z2 , W ] = Cov [z1 , W ] + Cov [z2 , w]
Cov [z, w1 + w2 ] = Cov [z, w1 ] + Cov [z, w2 ]
Cov[z, z] = Var[z]

1.13.5 Uncorrelatedness
Two complex random variables z and w are called uncorrelated if their covariance is zero:

Kzw = Jzw = 0

Thus,

E[z w̄] = E[z]E[w̄]

E[zw] = E[z]E[w]

1.13.6 Orthogonality
As with real random variables, complex quantities are said to be orthogonal if:

E[z w̄] = 0

As always, it does not imply covariance zero unless the mean of the variables are zero.

2 Complex functions
2.1 The natural logarithm of a complex number
In complex analysis, a complex logarithm of the non-zero complex number z is

w = ln z

22
• When is it undefined? If z is a real number, then we know that w is undefined for z ≤ 0
• Thus, we would be tempted to say that this function is undefined when z ≤ 0, but since z is complex,
it is not that simple.
• From the Polar form, we know that z = |z|eiθ . Thus,

ln z = ln |z|eiθ

ln z = ln |z| + ln eiθ

ln z = ln |z| + iθ ln e

ln z = ln |z| + i arg(z)

Thus, ln z is defined if

|z| > 0
– This means that we want the radius of z on the complex plane to be greater than zero
– Since |z| is always positive by definition, then we allow z < 0
– That is why we say that ln z is defined if z ∈ C\{0}, which means that the function is defined
for all complex numbers (positive and negatives) except for 0
• However, we have a problem: arg(z) is not unique, which means that ln is not the inverse of the
exponential function.
• For the function to be single-valued, we need to define the Principal Logarithm

2.1.1 Principal Logarithm


In complex analysis, the principal logarithm of the non-zero complex number z is

Ln z = ln |z| + i arg(z), −π < arg(z) ≤ π

That is, the the principal logarithm is the complex logarithm but using the principal value, Arg(z). Thus,
another way to write is

Ln z = ln |z| + i Arg(z)
• Thus, the function is unique
• Now, I have an expression for the inverse function of ez
Example 1
• Let’s find the principal logarithm for z = i

Ln i = ln |i| + i Arg(i)

23
• Recall

z = x + iy

p
|z| = x2 + y 2

Thus,

p
|z| = |0 + i(1)| = 02 + 1 2 = 1

• Notice that Arg(i) can be found directly on the complex plane to be π


2. However, we can also find it
by:

y
Arg(i) = tan−1
x

1
Arg(i) = arctan
0

π
Arg(i) =
2
• Thus,

π
Ln i = ln 1 + i
2

π
Ln i = i
2
Example 2
• Let’s find the principal logarithm for z = 1 + i

Ln(1 + i) = ln |1 + i| + i Arg(1 + i)

• Recall

z = x + iy

p
|z| = x2 + y 2

Thus,

p √
|z| = |1 + i(1)| = 12 + 12 = 2

24
• Notice that Arg(i) can be found directly on the complex plane to be π
4. However, we can also find it
by:

y
Arg(i + 1) = tan−1
x

1
Arg(i + 1) = arctan
1

π
Arg(i + 1) =
4
• Thus,

√ π
Ln (1 + i) = ln 2 + i
4

2.1.2 When is the function continuous?


We care about continuity because if the function is not continuous for some values of z, then it is not
differentiable for those values of z.
On a rough intuitive level, continuous means “looks connected as you zoom in”, and differentiable means
“looks like a line segment as you zoom in”. It can’t look like a line segment without looking connected.
To have a single-value function, we will focus on the principal logarithm. Thus, for z ∈ C\{0}, we have that
the principal logarithm is given by:

Ln z = ln |z| + i Arg(z)

The function f : C\{0} → C given by f (z) = Ln z is continuous at all z except those along the negative real
axis.
• Let’s analyze it by its components
• ln |z| is clearly continuous for all z ∈ C\{0} since the modulus operator, |.|, is always positive and we
are already excluding the case z = 0, so the argument of ln(.) is always greater than 0.
• The question is if the second component is continuous or not.
• The continuity of Arg(z)
– Arg(z) is noncontinuous on any point on the negative real axis by its definition.
– Recall that Arg(z) is an angle such that:

−π < Arg(z) ≤ π

So, the angle can never be −π. This is exaclty what ensures that the Arg is single-valued because
if we have a y = 0 and x < 0, the angle is always π and never −π. Allowing −π will yield to have
to values for this case, π and −π
– To better understand the locations of π and −π let’s take a look at Figure 3
– Now, when analyzing the limits we should take into account the sign of the angle since it indicates
if measured counterclockwise (Arg(z) < 0) or not

25
Figure 3: Arg(z)

– Let’s go to Figure 3 and see the point on the unit circle for the blue line. Let’s suppose it is
(x0 , y0 ). We can see that by fixing x = x0 , if approaching y from above to y0 , the limit for Arg(.)
is π/4. When approaching y from below to y0 , the limit is also π/4
– From Figure 3, we can see that when x < 0, and y goes to zero from above, the limit of the angle
is π
– However, when x < 0, and y goes to zero from below, the limit of the angle is −π
• Therefore, Arg(z) is discontinuous at each point on the nonpositive real axis:
– Let z = x0 + iy for some x0 < 0 fixed
– If y approaches to 0 from above, then Arg(z) ↓ π,
– Whereas if y approaches to zero from the bottom, then Arg(z) ↑ −π
• Thus, we have that for Ln(z) is continuous and well-defined for all z ∈ C\L, in which:

L = {z ∈ C : Re(z) ≤ 0 and Im(z) = 0}

• Thus, we see that we need to exclude more that 0 in order to get Ln(z) to be continuous

2.1.3 Complex differentiability of natural logarithm


Recall that if the function is not continuous for some points, then its derivative does not exist and therefore
it is not differentiable at those values. Thus, we need to restrict the set of values for z in f (z)
To have a single-value continuous well-defined function, we will focus on the principal logarithm for z ∈ C\L,
we have that the principal logarithm is given by:

Ln z = ln |z| + i Arg(z)
• Recall z = x + iy, so to find the derivative, we prefer to analyze

f (x, y) = u(x, y) + iv(x, y)

26
• For rewriting f (z) as f (x, y), we proceed to replace |z| and Arg(z):

p y
Ln z = ln x2 + y 2 + i tan−1
x

1 y
Ln z = ln (x2 + y 2 ) + i tan−1
2 x
Then, we have that:

f (x, y) = u(x, y) + iv(x, y)

1
u(x, y) = ln (x2 + y 2 )
2

y
v(x, y) = tan−1
x
• Remember we have that

f ′ (z) = ux (x, y) + ivx (x, y)

Also, notice that for y = arctan(x):

d(arctan(x)) 1
=
dx 1 + x2

and that for y = arctan x1 , we differentiate using the chain rule, which states that d

dx [f (g(x))] is
f ′ (g(x))g ′ (x) where f (x) = arctan(x) and g(x) = x1 .Thus,
 
dy 1 d 1
= 2
dx 1 + x1 dx x

Thus, we have
 
1 2x
ux (x, y) =
2 x2 + y 2

x
ux (x, y) =
x2 + y2

1  y 
vx (x, y) = 2 − x2
1 + xy


x2  y 
vx (x, y) = − 2
x2 + y 2 x

y
vx (x, y) = −
x2 + y2

27
• Therefore we have:

f ′ (z) = ux (x, y) + ivx (x, y)

 
d(Ln z) x y
= 2 + i −
dz x + y2 x2 + y 2

d(Ln z) x − iy
= 2
dz x + y2

TRICK: Use the notable product: difference of two squares to decompose the denominator:


(x + iy)(x − iy) = (x2 − (iy)2 ) = (x2 − ( −1)2 y 2 ) = (x2 − (−1y 2 )) = (x2 + y 2 )

Thus, we have:

d(Ln z) x − iy
=
dz (x + iy)(x − iy)

d(Ln z) x − iy
=
dz (x + iy)(x − iy)

d(Ln z) 1
=
dz (x + iy)

Recall: z = x + iy, so:

d(Ln z) 1
=
dz z
• So it does not exist for z = 0. Now, we only need to check if the derivative does not exist for any other
values
• We proceed to check the Cauchy-Riemann equations:

ux (x, y) = vy (x, y)

vx (x, y) = −uy (x, y)

• We have that:
 
1 2y
uy (x, y) =
2 x + y2
2

y
uy (x, y) =
x2 + y2

28
 
11
vy (x, y) = y 2 x

1+ x

x2
 
1
vy (x, y) =
x + y2
2 x

x
vy (x, y) =
x2 + y 2

• Thus we have that that the Cauchy-Riemann equations hold for any number of x and y:

x x
ux (x, y) = = vy (x, y) = 2
x2 +y 2 x + y2

y −y
vx (x, y) = − = −uy (x, y) = 2
x2 + y 2 x + y2

Thus, the derivative exists as long as x2 + y 2 ̸= 0, which is implied by z ̸= 0, which is also implied by
|z| > 0 that comes from the condition for the natural logarithm to be defined
• Therefore,
1. the natural logarithm of z is complex differentiable for all z ∈ C\L
2. the natural logarithm of z is analytic-regular-holomorphic for all z ∈ C\L

2.2 The natural exponential of a complex number


In complex analysis, the exponential of complex number z is

f (z) = ez
• When is it undefined? If z ∈ R, we know it is always defined
• Would that change now that z ∈ C?
• Recall z = x + iy, so

f (z) = ex+iy

f (z) = ex eiy

From Euler’s Theorem, since y ∈ R:

eiy = cos y + i sin y

Sine and Cosine are defined over every real number and the ex is defined for any x ∈ R. Thus f (z) is
defined for any x, y ∈ R
• Thus, the natural exponential function of a complex number is always defined for any z ∈ C

29
2.2.1 Complex differentiability
For any z ∈ C, we have that the natural exponential function is a well-defined function:

f (z) = ez
• Recall z = x + iy, so to find the derivative, we prefer to analyze

f (x, y) = u(x, y) + iv(x, y)

• For rewriting f (z) as f (x, y), we proceed to replace z:

f (x, y) = ex eiy

From Euler’s Theorem, since y ∈ R:

eiy = cos y + i sin y

Then, we have that:

f (x, y) = ex (cos y + i sin y)

f (x, y) = ex (cos y) + i(ex sin y)

f (x, y) = u(x, y) + iv(x, y)

u(x, y) = ex (cos y)

v(x, y) = ex sin y

• Remember we have that

f ′ (z) = ux (x, y) + ivx (x, y)

Also, notice that:

d
sin(x) = cos(x)
dx

d
cos(x) = − sin(x)
dx
Thus, we have

ux (x, y) = ex (cos y)

30
vx (x, y) = ex sin y

• Therefore we have:

f ′ (z) = ux (x, y) + ivx (x, y)

d(ez )
= ex (cos y) + iex sin y
dz

d(ez )
= ex (cos y + i sin y)
dz
Thus, we have:

d(ez )
= ex eiy
dz

d(ez )
= ex+iy
dz
Recall: z = x + iy, so:

d(ez )
= ez
dz
• As the derivative it the natural exponential function, it is defined for all z ∈ C. To double-check, we
proceed to check the Cauchy-Riemann equations
• We proceed to check the Cauchy-Riemann equations:

ux (x, y) = vy (x, y)

vx (x, y) = −uy (x, y)

• We have that:

uy (x, y) = ex (− sin y)

vy (x, y) = ex (cos y)

• Thus we have that that the Cauchy-Riemann equations hold for any number of x and y:

ux (x, y) = ex (cos y) = vy (x, y) = ex (cos y)

vx (x, y) = ex (sin y) = −uy (x, y) = −ex (− sin y)

Thus, the derivative exists for any ∈ C

31
• Therefore,
1. the natural exponential of z is complex differentiable for all z ∈ C
2. the natural exponential of z is analytic-regular-holomorphic for all z ∈ C

2.3 The complex valued function: (1 − αz)−1


Consider α ∈ C, so as general as possible. This function is useful for analyzing and AR process with ρ = α:

1
f (z) =
1 − αz
• When is it undefined? If z ∈ R, we know it is not defined when z = 1
α

• Would that change now that z ∈ C?


• No, since the only way f (z) is not defined is when the denominator is 0, which only occurs when z = 1
α

2.3.1 Complex differentiability


For any z ∈ C\{α−1 }, we have the following well-defined function:

1
f (z) =
1 − αz
• As usual, we would like to find a way to rewrite f (z) as

f (x, y) = u(x, y) + iv(x, y)

However, this time it is not easy!


• TRICK: to analyze the differentiability of f (z) we will use the natural exponential and natural loga-
rithm:

f (z) = eln f (z)

Thus,

1
f (z) = eln 1−αz

f (z) = eln 1−ln (1−αz)

f (z) = e− ln (1−αz)
– The natural exponential function, ez , is complex differentiable for all z ∈ C
– However, the natural logarithm, ln z, is not complex differentiable for all z ∈ C
• so, we only need to analyze the differentiability of ln (1 − αz)

32
• To get a single-value function, we analyze the principal logarithm:

Ln (1 − az) = ln |1 − az| + i Arg(1 − αz)

Thus, ln (1 − αz) is defined if

|1 − αz| > 0

Recall |.| is always positive. Thus we only need:

1 ̸= αz

That is,

1
z ̸=
α
• To find the derivative we can follow the same analysis as in 2.1.2 by analyzing:

ln(z ∗ ) ≡ ln(1 − αz)

, in which:

z ∗ = (1 − αx) + i(−αy)

• We know that Ln(z ∗ ) is not continuous nor well defined for

Lz∗ = {z ∗ ∈ C : Re(z ∗ ) ≤ 0 and Im(z ∗ ) = 0}

Recall that real part of z ∗ is 1 − αx. So,

Re(z ∗ ) ≤ 0

1 − αx ≤ 0

1
x≥
α
Recall that z = x + iy, so the real part of z is x. From above we have that:

1
x≥
α

1
Re(z) ≥
α

33
The imaginary part being zero in z ∗ is the same as the imaginary part being zero in z. So, by rewriting
L in terms of z, we have that:

1
L = {z ∈ C : Re(z) ≥ and Im(z) = 0}
α

Thus, Ln(1 − αz) is well defined and continuous for all z ∈ C\L
• We know that Ln z ∗ is complex differentiable for all z ∗ ∈ C\Lz∗ .
• Thus, we have that Ln(1 − αz) is complex differentiable for all z ∈ C\L
• That is, Ln(1 − αz) is analytic for all z ∈ C\L
• Therefore, f (z) = 1
(1−αz) is analytic for all z ∈ C\L

2.3.2 Taylor Expansion


For any z ∈ C\L, we have the following function:

1
f (z) =
1 − αz
• is well defined
• is analytic
• As f (z) is analytic everywhere inside C\L, we can use Taylor’s Theorem to express it as a power series
with z0 ∈ C\L:


X n
f (z) = an (z − z0 )
n=0


X f (n) (z0 )
(z − z0 )n
n=0
n!

where the series converges on any disk |z − z0 | < r contained in C\L


• Recall that r is the radius of convergence and we would like to find it so that we know that power
series converges

2.3.3 Convergence of the Taylor Expansion


• We are interested in finding r so we will use the Ratio Test
• First we need to choose z0 ∈ C\L
• Since 0 ∈ C\L, we select z0 = 0
• Now, we can proceed to use the ratio test, but we can use the same trick as when analyzing the
differentiability of f (z).
• So to find r when z0 = 0 for f (z) is the same as finding it for

g(z) = Ln(1 − αz)

34
• We know that g(z) is analytic everywhere inside C\L, so we can use the Taylor expansion around
z0 = 0:

α α2 2 α3 3 α4 4 αn n
ln(1 − αz) = − z 1 − z − z − z ··· − z + ...
1 2 3 4 n
So,


X αn n
ln(1 − αz) = − z
n=1
n

• To apply the Ratio test, we need to find the limit of the absolute ratios of consecutive terms, L:

αn+1 n+1
n+1 z
L = lim αn n
n z
n→∞

αn
L = lim z
n→∞ n+1

I can take out |αz| since it does not depend on n, so it is not affected by the limit:

n
L = |αz| lim
n→∞ n+1

Now, I can use L’Hôpital’s rule5

1
L = |αz| lim
n→∞ 1

L = |αz| lim |1|


n→∞

L = |αz||1|

L = |αz|

So, the limit, L, exists. Recall for convergence, we need

L<1

, which can only be achieved if:

|αz| < 1
f ′ (x)
: lim f (x) = lim g(x) = 0 or ± ∞, and g ′ (x) ̸= 0 for all x in I with x ̸= c, and lim exists, then
5
x→c x→c x→c g ′ (x)
f (x) f ′ (x)
lim = lim ′
x→c g(x) x→c g (x)

35
That is, if:

|α||z| < 1

|z| < |α|−1

• Thus, we have that the series g(z) and therefore the series f (z) converge for any z such that |z − 0| <
|α|−1 .
• One big difference between this function and 1
1−z is that the convergence can be achieved when |z| = 1
(i.e., z is on the unit circle) as long as

|a| < 1

This is an extremely important result as we will see when analyzing the lag operator.

3 Lag polynomial
• A lag polynomial is a polynomial in which the variable is the lag operator denoted by L
• The lag operator or backshift operator, B, operates on an element of a time series to produce the
previous element:

Lyt = yt−1

• The lag operator (as well as backshift operator) can be raised to arbitrary integer powers so that:

L2 yt = yt−2

L−1 yt = yt+1

• The lag polynomial will be useful when analyzing the properties of a time series.
• For


X
ψj Xt−j
j=−∞

we can express it as:


X
= ψj Lj Xt
j=−∞

= ψ(L)Xt

So, ψ(L) is the lag polynomial.

36
• As it is a polynomial, we will use all the tools learned in Section 1 and 2. So we would like to see for
instance if we can apply Taylor’s Theorem:


X 1
ψ(L) = ψj Lj =
j=0
1 − ρL

• As the time series is a random variable, we will need to use the concept of convergence of random
variables that we will learn in Section 4.
• The idea is to find the conditions for the lag polynomial such that a time series is stationary
• In short, we will try to express a times series in terms of a lag polynomial. As the time series is a
random variable, we will need to analyze the convergence in probability or in moments. As we will see
in Section 5, certain features of the moments of the variable can tell us if the variables is stationary
(e.g., second moment convergence is a necessary but not a sufficient condition for the series to be
stationary). Then, we will come up with what conditions we need for the lag polynomial such that
the time series is stationary

4 Convergence of Random Variables


• In some situations, we would like to see if a sequence of random variables X1 , X2 , X3 , · · · ”converges”
to a random variable X
• That is, we would like to see if Xn gets closer and closer to X in some sense as n increases
• For example, suppose that we are interested in knowing the value of a random variable X, but we are
not able to observe X directly
• Instead, you can do some measurements and come up with an estimate of X : call it X1 .
• You then perform more measurements and update your estimate of X and call it X2 .
• You continue this process to obtain X1 , X2 , X3 , · · · Your hope is that as n increases, your estimate gets
better and better.
• That is, you hope that as n increases, Xn gets closer and closer to X. In other words, you hope that
Xn converges to X.
• In fact, limit theorems (the weak law of large numbers (WLLN) and the central limit theorem (CLT))
use the concept of convergence.
• The WLLN states that the average of a large number of i.i.d. random variables converges in probability
to the expected value.
• The CLT states that the normalized average of a sequence of i.i.d. random variables converges in
distribution to a standard normal distribution
• In this section, we will develop the theoretical background to study the convergence of a sequence of
random variables in more detail
• In particular, we will define different types of convergence.

4.1 Convergence of a Sequence of Numbers


• If we have a sequence of real numbers a1 , a2 , a3 , · · · , we can ask whether the sequence converges

37
• For example, the sequence

1 2 3 n
, , ,··· , ,···
2 3 4 n+1

is defined as

n
an = , for n = 1, 2, 3, · · ·
n+1

We say that a sequence a1 , a2 , a3 , · · · converges to a limit L if an approaches L as n goes to infinity.


• That is, a sequence a1 , a2 , a3 , · · · converges to a limit L if

lim an = L
n→∞

That is, for any ϵ > 0, there exists an N ∈ N such that

|an − L| < ϵ, for all n > N

4.2 Types of Convergence for Sequences of Random Variables


• Unlike the non-random variables, there are different types of convergence when dealing with random
variables
• Consider a sequence of random variables X1 , X2 , X3 , · · · , i.e, {Xn , n ∈ N} .
• There are four types of convergence:
– Convergence in distribution
– Convergence in probability
– Convergence in r-th moment
– Almost sure convergence
• A sequence might converge in one sense but not in another one.
• Some of these convergence types are ”stronger” than others and some are ”weaker.”
• Figure 4 summarizes how these types of convergence are related

Figure 4: Relation between different types of convergence

38
4.3 Convergence in Probability
• Let {Xn : n ≥ 1} be a a random sequence of vectors on Rk
• That is, k elements in each vector. That is, k random variables.
• When k = 1, we are in the simple case of a sequence of random variables (Each vector is one dimensional,
which means that each vector is only one variable)
• Let X be a random vector on Rk
• Xn is said to converge in probability to X, denoted by:

p
Xn → X,

if, as n → ∞

P (|Xn − X| > ε) → 0

for any
ε>0

where | · | is the usual Euclidean norm (also called the L2 norm), which gives the ordinary distance
from the origin to the point X—a consequence of the Pythagorean theorem:
q
|X| = x21 + · · · + x2k

, in which · denote the positive root
• Equivalently, Xn is said to converge in probability to X, if

lim P (|Xn − X| > ε) = 0


n→∞

for any
ε>0

• Equivalently, Xn is said to converge in probability to X, if for any δ, ε > 0, there exists N ∈ N such
that,

P (|Xn − X| > ε) ≤ δ

for all n ≥ N
• It is worth pointing out that Markov’s inequality is very useful in many proofs involving convergence
in probability.

39
4.3.1 Some intuition
• Let’s say X is a constant, something finite
• Let’s sat Xn is simple sum:

n
X
Xn = xi
i=1

• As n goes to infinity, we have an infinite sum


• So, we would like to know if the infinity sum converges
• That is, we are interesting in knowing if the infinite sum is finite or not
• Since, the variable inside the sum is a random variable, now we have to think on what kind of conver-
gence are we checking
• Recall, we still want the infinite sum to converge to something finite.
• We cannot be sure of it because we have a random variable!
• However we can talk in probabilities.
• Thus, we would like to check if the infinite sum converges to something finite as n goes to infinity with
probability close to 1.
• In other words, we would like to check if the probability of the infinite sum being different than
something finite goes to 0 as n goes to infinity.
• That is exactly the definition of convergence in probability!
• You might ask why the Euclidean norm?
• Well, it is just part of the definition. If we are checking the difference between two elements. This
could be either positive or negative.
• Thus, the Euclidean norm appears there basically to ensure that the distance between the two elements
is positive
• That is why ϵ is any positive real number
• We are just saying, the probability of the distance between two elements being greater than any positive
number should go to zero as n increases
• In other words, the probability of the distance between two elements being equal to zero should go to
1 as n increases.

4.3.2 Markov’s Inequality


• For any random variable X,
E [|X|q ]
P(|X| > ε) ≤ , ∀q, ε > 0
εq
where | · | is the usual Euclidean norm
• If the RHS goes to 0 as n goes to infinity, then the LHS does it too.
• Thus, we can see directly how the convergence in the first moment, q = 1, implies convergence in
probability
• That is why Markov’s inequality is useful in many proofs involving convergence in probability.

40
4.4 Convergence in probability of a Infinite Sum of Random Variables: Abso-
lutely Summable
Theorem 3 If {Xt } is any sequence of random variables such that

sup E |Xt | < ∞,


t

and if


X
|ψj | < ∞,
j=−∞

then the series


X ∞
X
Yt = ψ(L)Xt = ψj Lj Xt = ψj Xt−j
j=−∞ j=−∞

converges.
Proof
• We will skip this one since Theorem 4 implies Theorem 3

4.5 Convergence in q th moment


• Let {Xn : n ≥ 1} be a sequence of random vectors with k elements
• Let X be a random vectors on Rk
• Xn is said to converge in q th moment to X such that q ≥ 1, denoted:

Lq
Xn → X

if, as n → ∞

q
E [|Xn − X| ] → 0

• By Jensen’s inequality we can show that convergence in higher moments imply convergence in lower
moments

4.5.1 Convergence in 1st moment


• Xn is said to converge in 1st moment to X such denoted:

L1
Xn → X

if, as n → ∞

E [|Xn − X|] → 0

41
• Let’s say X is something finite, then we have that Xn converges in first moment to something finite
• Thus, we have that:

E [|Xn |] < ∞

as n goes to infinity.
• Recall that we define expectations by sums or integrals only if they are absolutely summable or
integrable.
• Therefore, we have that a random variable Y has expectation only if |Y | has expectation:

X
E[Y ] = yp(y) < ∞
y

if and only if

X
E [|Y |] = |y|p(y) < ∞
y

• Thus, we have that

E [Xn ] < ∞
as n goes to infinity.

4.5.2 Convergence in 2nd moment


• This convergence is also known as mean square convergence or convergence on L2
• Xn is said to converge in 2nd moment to X such denoted:

L2
Xn → X

if, as n → ∞
h i
2
E |Xn − X| → 0

• Let’s say X is something finite, then we have that Xn converges in second moment to something finite.
• Thus, we have that the second moment of Xn exists.
• This convergence carries out a great implication: the variance exist.
• 2nd moment convergence implies 1st moment convergence. Thus, the first moment is also finite.
• The variance of Xn is given by:

Var(Xn ) = E[(Xn )2 ] − (E[Xn ])2

By definition of the expectation, we know that the expectation of Xn2 exists if and only if the expectation
2
of |Xn | exists. Thus, we have that the first component of the variance exists. The second component

42
exists because convergence of second moment implies the expectation of |Xn | exists which implies that
expectation of Xn exists and so the square of this expectation.

4.6 Convergence in second moment of a Infinite Sum of Random Variables:


Absolutely Summable
Theorem 4 If {Xt } is any sequence of random variables such that

2
sup E |Xt | < ∞
t

and if


X
|ψj | < ∞,
j=−∞

then, the series


X ∞
X
Yt = ψ(L)Xt = ψj Lj Xt = ψj Xt−j
j=−∞ j=−∞

converges in mean square.

4.6.1 Usefulness of this exercise


• This example is similar to the one in the previous section.
• It differs on the fact that now we are given that the second moment of Xt is finite instead of the first
moment.
• We know that the second moment being finite implies that the first moment is also finite.
• Thus, given the above conditions, this Theorem states that Yt has a finite second and first moment
even though it is an infinite sum of random variables.

4.6.2 Proof
• In this case we want to show that the infinite sum converges in mean square.
• That is, we want to show that the infinite sum converges in second moment to something finite.
• We will denote that something finite as K.
• Thus, we want to check if the the following holds:
 
2
n
X
ψj Xt−j − K  → 0
 
E
j=−n

as n goes to infinity.
• This trick is the same use in Proposition 3.1.1. in Brockwell and Davis (1991)

43
• Let’s say K is the finite variance if it exists. So we are actually only requiring
 
2

X
ψj Xt−j  = Finite
 
E
j=−∞

Now recall the definition of having a finite second moment: E|Z|2 < ∞.
Notice that E|Z|2 = E(Z)2 only for Z ∈ R.
If Z ∈ C, then we cannot say that because of the definition of a modulus.
I will explain it in more detail in the following lines.
Thus, we are looking for:
 
2

X
ψj Xt−j  < ∞
 
E
j=−∞

Again, we would love to use here the beautiful Triangle inequality, but this time we have the
Euclidean norm to the power of 2, so we cannot.
For this case, we will use the beautiful property of the |.|2
• Recall that the Euclidean norm or L − 2 norm (or distance) of a k-dimensional vector x is given by
v
u k
uX
|x| = t x2 i
i=1

, in which xi denotes an element of the k-dimensional vector. Thus, the square of the L-2 norm is:

v
u k 2
uX k
X
|x|2 = t x2i  = x2i
i=1 i=1

, so it is like cancelling the absolute value.


However, that is only useful if x is a Real number.
If we have that x is a complex number, then the modulus is such that it only contains the real parts
of the complex number:
v
u k
uX √
|z| = t |zi |2 = z1 z̄1 + · · · + zk z̄k
i=1

since:

|z|2 = z z̄

44
• Notice that in this case, we have a one dimensional vector.
Let’s allow for the random variable and for the coefficients to be complex numbers.
So using the definition of Euclidean norm is the L − 2 norm, we have that a direct application of the
above:
 
2   
n
X n
X n
X
ψj Xt−j  = E  ψj Xt−j   ψ̄j X̄t−j 
 
E
j=−n j=−n j=−n

 
n
X n
X 
= E ψj ψ̄k Xt−j X̄t−k 
j=−n k=−n

n
X n
X  
= ψj ψ̄k E Xt−j X̄t−k
j=−n k=−n

Notice that the last term looks like a covariance so we would like to use the Covariance Inequality.
It is not a covariance because it does not include the means.
• The Covariance Inequality for real random variables states that:

(Cov(Y, X))2 ≤ Var(X) Var(Y )

, which comes directly by applying Cauchy-Schwarz inequality, which can be written using only
the inner product:

|⟨X, Y ⟩|2 ≤ ⟨X, X⟩⟨Y, Y ⟩

or by taking the square root of both sides of the above inequality, the Cauchy–Schwarz inequality it
can be written using the norm and inner product :

|⟨X, Y ⟩| ≤ ∥X∥∥Y ∥.

when defining the inner product as:

⟨X, Y ⟩ = E(XY )

and therefore the norm as:

p p
∥X∥ = ⟨X, X⟩ = E(XX)

Specifically, it gives us:

|E(XY )|2 ≤ E X 2 E Y 2
 

but we can easily redefine x and y such that µ = E(X) and ν = E(Y ), then:

45
| Cov(X, Y )|2 = |E((X − µ)(Y − ν))|2
= |⟨X − µ, Y − ν⟩|2
≤ ⟨X − µ, X − µ⟩⟨Y − ν, Y − ν⟩
= E (X − µ)2 E (Y − ν)2
 

= Var(X) Var(Y )

For the case of complex random variables, we also proceed to use to use the Cauchy-Schwarz
inequality when defining the inner product as6 :

⟨Z, W ⟩ = E(ZW )

so,

⟨Z, Z̄⟩ = E(Z Z̄) = E(|Z|2 )

and the norm as:


q q p
∥Z∥ = ⟨Z, Z̄⟩ = E(Z Z̄) = E(|Z|2 )

In, this case we have:

|E[Z W̄ ]|2 ≤ E |Z|2 E |W |2


   

So,

p
|E[Z W̄ ]| ≤ E [|Z|2 ] E [|W |2 ]

• Recall we want to know if the following is finite (converges):

n
X n
X  
ψj ψ̄k E Xt−j X̄t−k
j=−n k=−n

We know that the above converges if the following converges:

n
X n
X  
ψj ψ̄k E Xt−j X̄t−k
j=−n k=−n

Now by applying the Cauchy-Schwarz inequality for complex random variables, we have that:

n
X n
X n
X n
X
r h i h i
  2 2
ψj ψ̄k E Xt−j X̄t−k ≤ ψj ψ̄k E |Xt−j | E |Xt−k |
j=−n k=−n j=−n k=−n

6 It can also be shown using Triangle inequality and Hölder’s inequality

46
h i h i
2 2
We would like to have that Xt is i.i.d with E|Xt |2 = θ < ∞, so E |Xt−j | = E |Xt−k | = θ and we
could factor it out of the sum.
However, we do not have an i.i.d process, since this is a more general case, but we do have that:

2
sup E [|Xt |] < ∞
t

, which means that the max possible value for E|Xt |2 for any t is finite.
Thus, we can assume as an extreme case that all the E|Xt |2 take that extreme value value:

n
X n
X
r h i h i n
X n
X r
2 2 2 2
ψj ψ̄k E |Xt−j | E |Xt−k | ≤ ψj ψ̄k sup E [|Xt |] sup E [|Xt |]
t t
j=−n k=−n j=−n k=−n

n
X n
X
r h i h i n
X n
X
2 2 2
ψj ψ̄k E |Xt−j | E |Xt−k | ≤ ψj ψ̄k sup E [|Xt |]
t
j=−n k=−n j=−n k=−n

so we can factor out the expectation of the infinite sum (since we are assuming all the expectations
are taking the supreme value, so it is no longer indexed to the infinite sum):

n
X n
X
r h i h i n
X n
X
2 2 2
ψj ψ̄k E |Xt−j | E |Xt−k | ≤ sup E [|Xt |] ψj ψ̄k
t
j=−n k=−n j=−n k=−n

Recall that |z|2 = z z̄, so:

2   
n
X n
X n
X n
X n
X
ψj = ψj   ψ̄j  = ψj ψ̄k
j=−n j=−n j=−n j=−n k=−n

Thus, we can rewrite the RHS such that:

2
n
X n
X
r h i h i n
X
2 2 2
ψj ψ̄k E |Xt−j | E |Xt−k | ≤ sup E [|Xt |] ψj
t
j=−n k=−n j=−n

n
X n
X
r h i h i n
X n
X
2 2 2
ψj ψ̄k E |Xt−j | E |Xt−k | ≤ sup E [|Xt |] ψj ψj
t
j=−n k=−n j=−n j=−n

by Triangle Inequality:

n
X n
X
ψj ≤ |ψj |
j=−n j=−n

Thus,

47
n
X n
X n
X n
X
2 2
sup E [|Xt |] ψj ψj ≤ sup E [|Xt |] |ψj | |ψj |
t t
j=−n j=−n j=−n j=−n

 2
n
X n
X n
X
2 2
sup E [|Xt |] ψj ψj ≤ sup E [|Xt |]  |ψj |
t t
j=−n j=−n j=−n

So, we just need to check if the RHS is finite as n goes to infinity:

 2

X
2
sup E [|Xt |]  |ψj |
t
j=−∞

Recall we are given that:


h i
2
sup E |Xt | < ∞
t

and

X
|ψj | < ∞
j=−∞

, which implies

 2

X
 |ψj | < ∞
j=−∞

Therefore, we have:

 2

X
2
sup E [|Xt |]  |ψj | < ∞
t
j=−∞

which implies that as n goes to infinity:


1.
2
n
 X
sup E |Xt |2

ψj <∞
t
j=−n

2.
n
X n
X
r h i h i
2 2
ψj ψ̄k E |Xt−j | E |Xt−k | < ∞
j=−n k=−n

3.
n
X n
X
r h i h i
2 2
ψj ψ̄k E |Xt−j | E |Xt−k | < ∞
j=−n k=−n

48
4.
n
X n
X  
ψj ψ̄k E Xt−j X̄t−k < ∞
j=−n k=−n

5.
n
X n
X  
ψj ψ̄k E Xt−j X̄t−k < ∞
j=−n k=−n

6.  
2
n
X
ψj Xt−j  < ∞
 
E
j=−n

 
2
n
X
lim E  ψj Xt−j  = Finite
 
n→∞
j=−n

7. Since K denotes the finite variance, then, we have, as n goes to infinity:


 
2
n
X
ψj Xt−j − K  → 0
 
E
j=−n

or

n
L2
X
ψj Xt−j → X
j=−n

• Thus, we have that the infinite sum converges in mean square/second moment/L-2
• Thus, it converges in the first moment, which implies convergence in probability.
P∞ 2
• Notice that for the variance of the infinite sum to be finite we haven’t needed j=−∞ |ψj | < ∞, it
P∞ 7
was enough to have j=−∞ |ψj | < ∞ as long as the second moment of Xt is finite.

4.7 Convergence in second moment of an Infinite Sum of Random Variables:


Square Summable
It seems that checking for an infinite sum to be absolutely summable is the only way we can get convergence
in second moment.
However, it is not the case!
Actually for convergence in second moment of the infinite sum we need a weaker condition: the infinite sum
to be square summable.
7 This is because any absolute-summable series is square-summable.

49
Theorem 5 If {Xt } is any sequence of random variables such that

2
sup E |Xt | < ∞
t

and if


X 2
|ψj | < ∞,
j=−∞

then the series


X ∞
X
Yt = ψ(L)Xt = ψj Lj Xt = ψj Xt−j
j=−∞ j=−∞

converges in mean square.

4.7.1 Usefulness of this exercise


• This condition will be relevant when dealing with fractionally integrated series or long memory processes
• A long memory process can be expressed as an infinite sum that is not absolutely summable when
d > 0. Then, we could wrongly conclude that it does not have a finite variance when d > 0, which
implies that the series is not stationary (having a finite second moment is a condition for stationarity).
• However, for 0.5 > d > 0, the infinite sum is square summable, which let us conclude that the process
has a finite second moment.

4.7.2 Proof
For a more advanced time series econometric course!

5 Stationary Time Series


5.1 Strict Stationarity

• The time series {Xt , t ∈ Z} is said to be strictly stationary if the joint distributions of (Xt1 , . . . , Xtk )

and (Xt1 +h , . . . , Xtk +h ) are the same for all positive integers k and for all t1 , . . . , tk , h ∈ Z
• Strict stationarity means intuitively that the graphs over two equal-length time intervals of a realization
of the time series should exhibit similar statistical characteristics. For example, the proportion of
ordinates not exceeding a given level x should be roughly the same for both intervals.

5.2 Covariance Stationarity


The time series {Xt , t ∈ Z} , with index set Z = {0, ±1, ±2, . . .}, is said to be stationary if
1. For all t ∈ Z, the second moment exists:

2
E |Xt | < ∞

50
2. For all t ∈ Z, the mean is independent of t:

EXt = m

The first condition, implies that first moment exists. However this condition requires it to be indepen-
dent of t and constant over time, which implies that the mean is the same for all Xt , ∀t ∈ Z
3. Let’s define the autocovariance function as a function of two variables:

γx (r, s) = Cov (Xr , Xs ) = E [(Xr − EXr ) (Xs − EXs )] , r, s ∈ T

Then, for all r, s and t ∈ Z, the autocovariance between Xt and Xt+s must only be a function of the
distance between the two observations, s:

γx (t, t + s) = g(s)

, such that:

γx (r, s) = γx (r + t, s + t) = g(s − r) = g(r − s)

since the the covariance between Xr+t and Xs+t is only a function of the distance between both
observations, s − r, which is the same as the distance between the observations Xr and Xs . Since both
covariance have the same distance, then they should be equal in order for the process to be stationary.

5.3 The Autocovariance and Autocorrelation Function


• If {Xt , t ∈ Z} is stationary then

γx (r, s) = γx (r − s, 0)

for all r, s ∈ Z.
• Thus, it is convenient to redefine the autocovariance function of a stationary process as the function
of just one variable:

γx (h) ≡ γx (h, 0) = Cov (Xt+h , Xt )

for all t, h ∈ Z
• The function γx (·) will be referred to as the autocovariance function of {Xt }
• γx (h) as the value of the autocovariance function of {Xt } at lag h.
• The autocorrelation function (acf) of {Xt } is defined analogously as the function whose value at lag h
is:

γx (h)
ρx (h) ≡ = Corr (Xt+h , Xt )
γx (0)

for all t, h ∈ Z

51
5.4 Complex-Valued Stationary Time Series
• Processes encountered in practice are nearly always real-valued
• However, it is mathematically simpler in spectral analysis to treat them as special cases of complex-
valued processes.
The process {Xt } is a complex-valued stationary process if
1. For all t ∈ Z, the second moment exists:

2
E |Xt | < ∞

2. For all t ∈ Z, the mean is independent of t:

E[Xt ] = m

3. For all t ∈ Z, the autocovariance is independent of t:

Cov[Xt+h , Xt ] = E[(Xt+h − E[Xt+h ])(Xt − E[Xt ])]


= E[Xt+h X̄t ] − E[Xt+h ]E[X̄t ]

Given that for the process to be stationary we need condition 2: EXt+h = E X̄t = m, then for condition
3, we only need:


E Xt+h X̄t to be independent of t

This is what makes it different from our usual definition with real-valued numbers: the fact that when
dealing with complex-valued random variables, the covariance formula requires the conjugate of the
variables.

5.5 The Complex-Valued Autocovariance Function


• The autocovariance function γ(·) of a complex-valued stationary process {Xt } is:


γ(h) = E Xt+h X̄t − EXt+h E X̄t

for all t, h ∈ Z
• The function γx (·) will be referred to as the autocovariance function of {Xt } and has the following
properties8 :

γ(0) ≥ 0
|γ(h)| ≤ γ(0) for all integers h
γ(·) is a Hermitian function (i.e. γ(h) = γ(−h))
8A Hermitian function is a complex function with the property that its complex conjugate is equal to the original function
with the variable changed in sign: f (x) = f (−x)

52
5.6 Stationarity of an Infinite sum of random variables (Real-valued/complex-
valued)
Theorem 6 If
1. {Xt } is a stationary process with autocovariance function γX (·)
2.

X
|ψj | < ∞,
j=−∞

3.

X
Yt = ψj Xt−j
j=−∞

Then, then for X ∈ R, the process {yt } is stationary with autocovariance function:


X
γY (h) = ψj ψk γX (h − j + k)
j,k=−∞

For X ∈ C with zero mean, the process {Yt } is stationary with autocovariance function:

∞ X

 X
γY (h) = E Yt+h Ȳt = ψj ψ̄k γX (h − j + k), h = 0, ±1, . . .
j=0 k=0

5.6.1 Proof
• If {Xt } is stationary, we have that

E |Xt | = c

where c is finite and independent of t, and that its second moment is finite. Thus we can apply Theorem
3 and 4
• From the Theorem 3 and Theorem 4, we know that


X
ψj Xt−j
j=−∞

converges in second and first moment. Thus, Yt has a finite second moment, which is the first require-
ment to be stationary.
• Now we need to check its first moment. We know that its first moment is finite, but is it also time
independent?:
   
n
X ∞
X ∞
X
E[Yt ] = lim ψj EXt−j =  ψj  E[Xt ] =  ψj  c
n→∞
j=−n j=−∞ j=−∞

Yes, it is!. Notice that finding its mean is much more simpler than Theorem 3 because we are not
finding E|Yt |. We know it exists so E[Yt ] must exist, and we just have to find it in the usual easy way.

53
• Notice that we could also find the variance of Yt by finding the variance of the infinite sum, which is
straightforward and easy given that in this case Var(Xt ) = σ 2
• Now, we just have to find the autocovariance function for Yt :
  !
n
X n
X
E (Yt+h Yt ) = lim E  ψj Xt+h−j  ψk Xt−k 
n→∞
j=−n k=−n


X  
2
= ψj ψk γX (h − j + k) + (E[Xt ])
j,k=−∞

Thus, E[Yt ] and E (Yt+h Yt ) are both finite and independent of t. The autocovariance function γY (·) of
{Yt } is given by


X
γY (h) = E (Yt+h Yt ) − E[Yt+h ]E[Yt ] = ψj ψk γX (h − j + k)
j,k=−∞

6 Spectral Representation
6.1 Frequency Domain
• Frequency is the number of occurrences of a repeating event per unit of time (e.g. beats per second).
• The frequency domain refers to the analysis of mathematical functions or signals with respect to
frequency, rather than time.
• In signal processing, a signal is a function that conveys information about a phenomenon. In electronics
and telecommunications, it refers to any time varying voltage, current or electromagnetic wave that
carries information

6.2 Frequency Domain versus Time Domain


• A time-domain graph shows how a signal changes over time.
• A frequency-domain graph shows how much of the signal lies within each given frequency band over a
range of frequencies.
• The idea here is to realize that when observing an intensity (e.g. pressure) of a series over time, we
are only looking at the final product but we are not looking at its decomposition.
• Let me explain this idea with a sound. Figure 5 shows the evolution of the air pressure of 3 sounds. A
sound signal represents variations in air pressure over time. The dotted line represents the equilibrium
of each sound
• The sounds D294 (294 beats per second) and A440 (440 beats per second) are pure sines waves. Thus
we can characterized their evolution over time by the sine function.
• The top panel on Figure 5 represents the air pressure generated when both sounds are played at once.
The resulting graph shows that at any point of time the pressure difference will be the sum of the
pressure difference of each note.
• This is something complicated to think about. At some points the peaks match up with each other,
resulting in a really high pressure. At other points, they tend to cancel out.

54
• Thus, what you get is a wave-ish pressure versus time graph because it is the combination of two pure
frequencies.
• However, the resulting series it is not a pure sine wave.
• Now, imagine that you add more notes, then the resulting series is even more complicated.
• Recall we only see the final series, so is there a way to decompose a final signal (series) into the pure
frequencies that make it up?
• Well, for that we need a a mathematical machine that treats signals with a given frequency differently
from how it treats other signals.
• An example is the Fourier transform, which converts a time function into a sum or integral of sine
waves of different frequencies, each of which represents a frequency component.

Figure 5: Air Pressure of sound over time

6.3 Advantages
• The simplification of the mathematical analysis.

6.4 Fourier Transform - Some Intuition


6.4.1 Winding the original series around a circle
• To start let’s analyze the pure signal with a lowly 3 beats per second (Figure 6)
• The idea is to take the usual series on the time domain and wrap it up around a circle like in Figure 7
• This wrapping up is such that we have a vector (represented by the white arrow) where at each point
in time, t, its length is equal to the height of the original series at t
• Thus, high points of the original graph correspond to a greater distance from the origin on the circular
graph. Consequently, low points end up closer to the origin
• The circle in Figure 7 is drawn in such a way that moving forward 2 seconds in time corresponds to
a single rotation around the circle. That is easy to see since every one second there are 3 beats and

55
on the circle, and there are 6 times that the vector reaches it maximum length. So, a single rotation
around the circle lasts 2 seconds
• Let’s define a cycle as the rotation of the vector on the circle. Thus, in this case we have that the
vector on the circle is rotating at 0.5 cycle per second (or 1 cycle per 2 seconds, i.e., 1 rotation every
2 seconds.)
• Thus, at the moment we have two frequencies.
• There’s the frequency of our signal, which goes up and down, three times per second. And then,
separately, there’s the frequency with which we’re wrapping the graph around the circle, which at the
moment is half cycle per second.

6.4.2 The winding frequency


• Notice that we can adjust that second frequency however we want.
• Figure 8 represents wrapping it around faster. So in this graph the petals are fatter because we have
that one cycle is reached in 0.81 seconds, which means that there should only be 2.5 beats in one circle.
We see a lot petals because we are representing the total 4.5 seconds of the original graph.
• Figure 9 represents wrapping it around slower (1 cycle in 5 seconds, so we should expect 15 petals in
one circle. We have space for 15 petals but we only have 12.5 petals because we are representing the
4.5 seconds of the original graph).
• Thus, the choice of winding frequency determines what the wrapped up graph looks like.
• Since the signal has 3 beats per second, we can do a funny thing!. We can wrap it up around the circle
such that it has a frequency of 3 cycles per second, which implies that it only has one petal in a circle.
Figure 10 depicts this case: the winding frequency matches the frequency of our signal (three beats
per second).
• In this case, the petal indicates that all the high points on the graph happen on the right side of the
circle and all of the low points happen on the left
• This is an interesting feature and we can we take advantage of that in our attempt to build a frequency-
unmixing machine.

6.4.3 The center of mass of the winding graph


• Imagine that the graph on the circle has some kind of mass to it, like a metal wire. We can represent
the center of mass of that wire by a little dot like in Figure 11
• As we change the frequency, and the graph winds up differently, that center of mass kind of wobbles
around a bit.
• For most of the winding frequencies, the peaks and valleys are all spaced out around the circle in such
a way that the center of mass stays pretty close to the origin.
• However, when the winding frequency is the same as the frequency of our signal, we have in this case
that all of the peaks are on the right and all of the valleys are on the left, so the center of mass is
unusually far to the right (See Figure 12)

6.4.4 The center of mass as a function of the winding frequency


• To capture the above, let’s draw some kind of plot that keeps track of where that center of mass is for
each winding frequency. Figure 13, 14, 15, 16, 17 and 18 depicts this.

56
• Of course, the center of mass is a two-dimensional thing, and requires two coordinates to fully keep
track of, but for the moment, let’s only keep track of the x coordinate.
• From Figure 13, 14, 15, 16, 17 and 18, we can observe that:
– For a frequency of 0, when everything is bunched up on the right, this x coordinate is relatively
high
– Then, as you increase that winding frequency, and the graph balances out around the circle, the
x coordinate of that center of mass goes closer to 0 and it just kind of wobbles around a bit.
– But then, at three beats per second, there’s a spike as everything lines up to the right.
• Notice that in this example we have that the original series is not around zero. This is why we have
a big number for frequency zero (The cosine wave is shifted up). If we allow the original series to be
around zero, then the spike for frequency zero does not show up and we only have a spike on frequency
3. See Figure 19
• Thus, we can clearly see how the winding frequency affects the center of mass
• Those graphs with the center of mass as a function of the winding frequency is an almost Fourier
Transform of the original signal
• This is super important! Imagine having as original series the sum of a 3 and a 2 beats per second
signals. Then, if we only look at the final series, we are not able to identify the pure signals. However,
if we are able to wrap it up in around circle and then make a graph of its center of mass as a function
of the winding frequency, we will be able to detect the spikes of the pure signals. See Figure 20
• Now what’s going on here with the two different spikes, is that if you were to take two signals and
then apply this Almost-Fourier transform to each of them individually, and then add up the results,
what you get is the same as if you first added up the signals, and then applied this Almost-Fourier
transform. See Figure 21

6.4.5 Winding the original series around a circle on the Complex Plane
• Now, what is missing for the Fourier Transform?
• Well, until now we were focusing of the x value of the center of mass. However, the circumference is
graphed in a two dimensions plane. So what about the y value of the center of mass?
• Well, we can think of it as the Complex Plane. Then, the center of mass is a complex number that
has both real and imaginary part
• Complex numbers lend themselves to really nice descriptions of things that have to do with winding
(a twisting movement or course) and rotation.
• For example, Euler’s Formula tells us that if you take e to the power of some real number times i,

z = eiθ

, you’re gonna land on the point that you get if you were to walk that number of units around a circle
with radius 1, counterclockwise starting on the right. See Figure 22
• So, imagine you wanted to describe rotating at a rate of one cycle per second:
– Recall that π = 3.1416 . . . and the the full length of circumference of a circle with radius 1 is 2π.
Thus we have that 1 cycle can be denoted by 2π.
– Then, you could start by taking the expression e2πti ,where t is the amount of time that has passed

57
– Then, decompose t in f times t so f captures the winding frequency. If f = 1, then we are in the
case of 1 cycle per unit time t (second, in this example)
1
– For example, if f = 10 , this vector makes one full turn every 10 seconds since the time has to
increase all the way to 10 before the full exponent looks like 2πi. So, we have 1 cycle every 10
seconds or 1/10 cycle every 1 second
• Remember that the idea is to wrap up the series around a circle on the complex plane
• We can make it by using e2πf ti
• Let’s say that the original series can be described by g(t), then we have that the wrapping up around
the circle on the complex plane is given by:

g(t)e2πf ti for counterclockwise rotation

or

g(t)e−2πf ti for clockwise rotation, which is the convention

See Figure 23
• This is amazing, since this really small expression is a super elegant way to encapsulate the whole idea
of winding a graph around a circle with a variable frequency f .
• Now, remember we want to express the measure of mass a function of the frequency. How can we
obtain the center of mass?
• Well, we need to consider all the observed data (i.e., for all time) and let the only exogenous variable
in the function ϕ(.) be the frequency per unit of time.
• Thus, we could take the sum (for discrete time) or integral (for continuous time) of that expression
when the number of observations goes to infinity.

Figure 6: Pure signal with 3 beats per second on the time domain

6.5 Fourier Transform - Formal Definition


Given a (non-stochastic) sequence xt , t = . . . , −1, 0, 1, . . ., we can define the Fourier Transform of xt as:


X
x̃(ω) = xt e−itω
t=−∞

, in which
• ω ∈ (−π, π]:

58
Figure 7: Wrapping the pure signal with 3 beats per second up around a circle

Figure 8: Wrapping the pure signal with 3 beats per second up around a circle faster

Figure 9: Wrapping the pure signal with 3 beats per second up around a circle slower

59
Figure 10: The winding frequency matches the frequency of our signal (three beats per second)

Figure 11: The center of mass

Figure 12: The center of mass when the winding frequency matches the frequency of our signal

Figure 13: Graph of the center of mass as function of the winding frequency

60
Figure 14: Graph of the center of mass as function of the winding frequency

Figure 15: Graph of the center of mass as function of the winding frequency

Figure 16: Graph of the center of mass as function of the winding frequency

Figure 17: Graph of the center of mass as function of the winding frequency

Figure 18: Graph of the center of mass as function of the winding frequency

61
Figure 19: The original cosine wave when centered around zero

Figure 20: Decomposing the pure signals of an accumulated signal

62
Figure 21: Decomposing the pure signals of an accumulated signal

Figure 22: Euler’s Formula on the Complex Plane

63
Figure 23: Using Euler’s Formula to get the winding version of the original signal

– This is to put an explicit boundary to the frequency: we are limited to have something between
0 and 1 cycle, which in this case is 2π radians.
– Why? When we observe economic data for any unit of time that means that each point changes
every time t changes.
– That is, we cannot have a movement between t=1 and t=2 since by construction there is nothing
being measured there.
– For example, when observing GDP for t in months: when t goes from Dec2000 to Jan 2001, there
is no change in the value of GDP between those two units of time because it is not being measured.
– However if we go to the example of the music tone, we have that the air pressure reaches its max
3 times between t=1 second and t=2 seconds.
– Thus, we have 3 beats per second and eventually 3 cycles per second. That cannot happen with
economic data. Let’s imagine t=1,2,3,4. Then if GDP is 1000, 2000, 3000, 2000, we have that it
reaches its max every 3 months, so eventually we will have 1/3 cycles per month (or 2π/3 radians
per unit of time).
– Now, if GDP is 1000, 2000, 1000, 2000, we have that it reaches its max every 2 months, so
eventually we will have 1/2 cycles per month (or 2π/2 radians per unit of time).
– Thus, by construction when looking at economic data for any frequency of time, we will not have
something like 1,2,3,... cycles per second.
– As much we can have 1/2 cycles per unit of time (2π/2 radians per unit of time). Thus, having
ω ∈ (−π, π] makes a lot of sense.
• ω is the winding frequency but in a different measure:

ω = 2πf

It is radians per unit of time

64
• Any given, fixed value to ω says how quickly e−it circumnavigates the unit circle. For instance, ω = 2π
means that a full rotation of the unit circle in 1 unit of t, that is 1 cycle in one unit of time. This is
the same as f = 1, so we get ω = 2π, which means 1 cycle per unit of time.
• We can read it as ω radians per unit of time, which doest not imply a full rotation or cycle. It is a full
rotation when ω = 2π.
• So if the unit of t is a quarter:
– ω=π:
∗ One pi radian per unit of time or half cycle per unit of time.
∗ This is like in the previous subsection when using f = 1/2, so we obtain the same exponent:
πti.
∗ That is 1 cycle every 2 units of time, which is equivalent to 0.5 cycle every 1 unit of time.
∗ If the true g(t) is such that xt = g(t) = cos(ωt), then we have that it has 1 beat every two
quarters.
∗ This means alternations every unit of time: in t we are on the valley, in t + 1, we are on the
peak, in t + 2 we are in the valley again.
– ω = π/2:
∗ half pi radians per unit of time.
∗ That is, 0.25 cycle per unite of time.
∗ This is like in the previous subsection using f = 1/4, so we obtain the same exponent: π
2 ti.

∗ That is 1 cycle every 4 units of time, which is equivalent to 0.25 cycle every 1 unit of time.
∗ If the true g(t) is xt = cos(ωt), we have that it has 1 beat every 4 quarters.
∗ So we have annually recurring events.
• Recall, by Euler’s formula:

eiω = cos(ω) + i sin(ω)

e−iω = cos(ω) − i sin(ω)

• Using the exponential form of complex number we can express:

z = e−iω

, with the property of being on the unit circle

|z| = 1

• Thus, the Fourier transform can also be written as:


X
x̃(ω) = xt z t
t=−∞

65
, with z = e−iω over ω ∈ (−π, π]

6.6 Fourier Inversion


The Fourier transform can be inverted to give the original sequence, term by term. When x
e is the Fourier
transform of sequence
xt = {. . . , x−1 , x0 , x1 , . . .}
then Z +π
1
xj = e(ω)eiωj dω
x
2π −π
for every integer j, j = . . . , −1, 0, 1, . . ..

6.6.1 Proof
• This follows from writing out the right side
Z π Z π
1 1 X 1
e(ω)eiωj dω =
x xt e−iω(t−j) dω = (2π)xj = xj
2π −π 2π −π t

• after noticing that for all j ̸= t, we have that:


Z π
e−iω(t−j) dω = 0
−π

6.7 Alternative Conventions for formal definitions


1
6.7.1 Putting the the factor of 2π in the Fourier transform instead of in its inverse
• This is how Uhlig defines it on its notes (if you are interested in understanding sign identification).
• The Fourier Transform:


1 X
x̄(ω) = xt e−itω
2π t=−∞

• Inverse Fourier transform:


Z π
xj = x̃(ω)eiωj dω
−π

1
6.7.2 Splitting the factor of 2π evenly between the Fourier transform and its inverse
• The Fourier Transform:


1 X
x̄(ω) = √ xt e−itω
2π t=−∞

• Inverse Fourier transform:


Z π
1
xj = √ x̃(ω)eiωj dω
2π −π

66
6.8 Lag Operator Calculus and Fourier Transforms
• Consider

yt = h(L)εt

• For any form of h(L), we have that the Fourier Transform of yt is given by:

ỹ(ω) = h e−iω ε̃(ω) = 2π h̃(ω)ε̃(ω)




• So, the Fourier Transform of yt is given by the lag polynomial evaluated at the unit circle times the
Fourier Transform of εt
• How is this possible? In the following lines I will show it

6.8.1 Case 1
• Consider

h(L) = h0 + h1 L

So, we have that:

yt = h0 εt + h1 εt−1

• The Fourier Transform of yt is given by:


X
ỹ(ω) = yt e−itω
t=−∞


X
ỹ(ω) = (h0 εt + h1 εt−1 ) e−itω
t=−∞


X ∞
X
ỹ(ω) = h0 εt e−itω + h1 εt−1 e−itω
t=−∞ t=−∞

TRICK: We will try to put the second term such that it is indexed to t − 1


X ∞
X
ỹ(ω) = h0 εt e−itω + h1 εt−1 e−itω e−iω eiω
t=−∞ t=−∞


X ∞
X
ỹ(ω) = h0 εt e−itω + h1 e−iω εt−1 e−itω+iω
t=−∞ t=−∞


X ∞
X
−itω
ỹ(ω) = h0 εt e + h1 e−iω εt−1 e−iω(t−1)
t=−∞ t=−∞

67

X ∞
X
ỹ(ω) = h0 εt e−itω + h1 e−iω εt−1 e−iω(t−1)
t=−∞ t=−∞

Recall that the Fourier Transform pf ε is given by:


X
ε̃(ω) = εt e−itω
t=−∞

Given that the definition of a Fourier Transform implies that we are using all values of t (i.e., it goes
from negative infinity to positive infinity), then the we simply change the indexation in the infinite
sum from t to t − 1 and it will still be the Fourier Transform of yt :


X
ε̃(ω) = εt−1 e−i(t−1)ω
t=−∞

Therefore, we have that:

ỹ(ω) = h0 ε̃(ω) + h1 e−iω ε̃(ω)

ỹ(ω) = h0 + h1 e−iω ε̃(ω)




ỹ(ω) = h(e−iω )ε̃(ω)

in which, h(e−iω ) is the polynomial h(L) evaluated at L = e−iω .

h(e−iω ) = h0 + h1 e−iω

That is, we have that the lag polynomial h(L) is being evaluated at the unit circle given that e−iω is
the exponential form of a complex number such that its modulus equals 1:

|z| = 1 for z = e−iω

6.8.2 Case 2: AR(m)


• Consider

yt = ρ1 yt−1 + ρ2 yt−2 + . . . ρm yt−m + εt

• We can rewrite it as:

(1 − ρ(L))yt = ϵt

ϵt = (1 − ρ(L))yt

68
ϵt = h(L)yt

Then, we have that:

ϵ̃(ω) = h e−iω ỹ(ω)




1 − ρ e−iω

ỹ(ω) = ϵ̃(ω)

6.8.3 Case 3: MA(n)


• Consider

yt = εt + θ1 εt−1 + θ2 εt−2 + . . . θn εt−n

• We can rewrite it as:

yt = θ(L)ε

• Then, we have that:

ỹ(ω) = θ e−iω ϵ̃(ω)




6.9 Spectral Density


6.9.1 Some Intuition
• From the previous section, we know that a time series Xt can be expressed as its Fourier Transform
x̃(ω)
• We can construct the autocovariance function between Xt and Xt+s :

γx (r, r + s) = Cov (Xr , Xr+s )

• If Xt is covariance stationary, then we have that:

γx (r, r + s) = γx (0, s) = γx (s)

That is, we can express the autocovariance function of Xt as a function of only one variable, s, instead
of two variables (r,r + s). This is key because it will allow us to generate a new time series.
• Since s denotes the autocovariance function of Xt at lag s, we could generate a new time series γx (s)
for s = 0, 1, 2, 3, 4, 5, 6, . . . .
• It is a time series because it is indexed to s, which is a unit of time (e.g., depending of the unit of time
for t, the lag can be in terms of years, months, quarters, seconds).
• Changing a bit the notation we could have the new time series:

γx (j) ≡ γj

for j = 0, 1, 2, 3, 4, 5, 6, . . . .

69
• Thus, we could also have the Fourier transform of γj :


1 X
γ̃(ω) = γj e−ijω
2π j=−∞

, in which ω ∈ (−π, π] represents the frequency. This is known as the Spectral density of Xt
• This is extremely useful, since from the Spectral density of Xt we are able to find each autocovariance
of Xt , γj , by using the Fourier Inverse.

6.9.2 Formal Definition


• Let xt ∈ R, t = . . . , −1, 0, 1 . . . be covariance stationary with mean zero.
• Let’s denote the autocovariance function of xt as γj
– If xt is real-valued, we have that it is a symmetric function:

γj = E [xt xt−j ] = γ−j

– If xt is complex-valued, we have that it is a Hermitian function:

γj = E [xt x̄t−j ] = γ −j

• Then, the Spectral Density of xt is given by the Fourier Transform of the autocovariance function
of xt :


X
sx (ω) = γ̃(ω) = γj e−ijω
j=−∞

, in which ω ∈ (−π, π] represents the frequency.


• Since γj is symmetric about 0, i.e., γj = γ−j , the spectral density is also symmetric about 0:

sx (ω) = sx (−ω)

For multivariate case: sx (ω) = sx (−ω)′

6.9.3 Approximation of the spectral density by the second moment of the Fourier transform
• Notice that:

∞ ∞
" #
X X
−itω
E[x̃(ω)x̃(ω)] = E xt e x̄t e−itω
t=−∞ t=−∞

Let’s suppose we only have 3 observations: t = −1, 0, 1:


" 1 1
#
X X
−itω
E[x̃(ω)x̃(ω)] = E xt e x̄t e−itω
t=−1 t=−1

70
h  i
=E x−1 e−i(−1)ω + x0 e−i(0)ω + x1 e−i(1)ω x̄−1 e−i(−1)ω + x̄−0 e−i(−0)ω + x̄1 e−i(1)ω

= E[x−1 x̄−1 e−i(−1)ω e−i(−1)ω + x−1 x̄0 e−i(−1)ω e−i(0)ω + x−1 x̄1 e−i(−1)ω e−i(1)ω +
x0 x̄−1 e−i(0)ω e−i(−1)ω + x0 x̄0 e−i(0)ω e−i(0)ω +
x0 x̄1 e−i(0)ω e−i(1)ω + x1 x̄−1 e−i(1)ω e−i(−1)ω +
x1 x̄0 e−i(1)ω e−i(0)ω + x1 x̄1 e−i(1)ω e−i(1)ω ]

Since when z = x + iy, we have that z = x − iy, then when z = e−i(−1)ω , we must have that
z = e−i(−1)ω = ei(−1)ω . Further, recall zz = |z|2 ; for z = e−i(−1)ω , we have |z| = 1. Finally, notice
that e−i(0)ω = e0 = 1

= γ0 + γ−1 e−i(−1)ω + γ−2 eiω eiω +


γ1 e−i(1)ω + γ0 + γ−1 e−i(−1)ω + γ2 e−i(2)ω +
γ1 e−i(1)ω + γ0
= γ−2 e−i(−2)ω + 2γ−1 e−i(−1)ω + 3γ0 e−i(0)ω + 2γ1 e−i(1)ω + γ2 e−i(2)ω

Thus, it is approximate:

2

X
= γj e−ijω
j=−2

Therefore, for a more general case with infinite observations for xt , the Spectral density of xt can
be approximated by:


X
sx (ω) = γ̃(ω) = γj e−ijω
j=−∞

= E[x̃(ω)x̃(ω)]

6.9.4 Finding the autocovariance: Fourier Inverse


• Since the Spectral density is a Fourier Transform, then we can always apply the Fourier Inverse to
find each coefficient inside the infinite sum of the Fourier Transform, which in this case is γj (i.e., each
autocovariance of xt )
• The Fourier Inverse of the Spectral density of xt gives us:
Z π
1
γj = sx (ω)eiωj dω
2π −π

, in which γj is the autocovariance of xt of order j for every integer j.


• For j = 0, we have that
Z π
1
γ0 = sx (ω)dω
2π −π

Thus, we can interpret sx as spreading out x ’s variability across the range of frequencies ω ∈ (−π, π].

71
Example: White Noise
• Suppose

xt = εt

, where εt is a white noise with variance σ 2 .


• Thus, we have that for xt :

σ2

for j = 0
γj =
0 otherwise

• Thus, the spectral density of xt is given by only one element:


X
sx (ω) = γ̃(ω) = γj e−ijω = γ0 = σ 2
j=−∞

Example: MA(1)
• Suppose

xt = εt + θ1 εt−1

, where εt is a white noise with variance σ 2 .


• Thus, we have that for xt :

γj = 0 for j > 1

That is, only γ0 , γ1 and γ−1 exist.


• Thus, the spectral density of xt is given by only three elements:


X
sx (ω) = γj e−ijω
j=−∞

sx (ω) = γ0 + γ1 e−iω + γ−1 eiω

• Given that γ1 = γ−1 :

sx (ω) = γ0 + γ1 e−iω + eiω




• By using Euler’s formula:

sx (ω) = γ0 + γ1 (cos ω − i sin ω + cos ω + i sin ω)

sx (ω) = γ0 + γ1 (cos ω + cos ω)

72
sx (ω) = γ0 + 2γ1 cos ω

Example: Stationary AR(1)


• Suppose

xt = ρxt−1 + εt

, where εt is a white noise with variance σ 2 and |ρ| < 1 so that xt is stationary.
• Thus, we have that for xt :

σ2
 
γj = ρ|j|
1 − ρ2

for all integer j.


• Thus, the spectral density of xt is given by:


X
sx (ω) = γj e−ijω
j=−∞


X σ2
sx (ω) = ρ|j| e−iωj
j=−∞
1 − ρ2


σ2 X
sx (ω) = ρ|j| e−iωj
1 − ρ2 j=−∞

• Now, it is a more complicated result. However, we can use a property for the lag operator when finding
the spectral density

6.9.5 Lag Operator Calculus, Stationarity and the Spectral density


Theorem 7 Consider


X
yt = ψj Lj xt = ψ(L)xt
j=0

, in which xt is a (complex-valued) zero-mean stationary process.


P∞ 2
• Then, if ψ(z) converges on the unit circle or if that j=0 ψz
j
converges on the unit circle, we have
that yt is stationary with its Spectral density given by:

2

X
−iw
sy (ω) = ψj e sx (ω)
j=0

2
sy (ω) = ψ(e−iwj ) sx (ω)

73
So, the spectral density of yt in terms of the spectral density of xt involves the lag polynomial evaluated
on the unit circle, which implies that the lag polynomial evaluated on the unit circle must converge.
This is why stationarity plays an important role when finding the spectral density of an AR process.
Proof
• Notice that from from Theorem 3 and Theorem 4, we immediately have that yt has a finite second
moment.
• Further, given that xt is stationary, yt is also stationary (Theorem 6).
• Now, let’s allow both variables to be possibly complex valued. Then, since, the autocovariance function
of yt is given by:
∞ X
X ∞
γy (h) = E (yt+h ȳt ) = ψj ψ̄k γx (h − j + k), h = 0, ±1, . . . (9)
j=0 k=0

The Fourier inverse of the spectral density of {xt } gives us:


Z π
γx (h) = sx (ω)eiωh dω (10)
−π

Then, we can rewrite (9) as:

X ∞
∞ X Z π
γy (h) = ψj ψ̄k sx (ω)eiω(h−j+k) dω
j=0 k=0 −π
Z π ∞ X
X ∞
= sx (ω) ψj ψ̄k eiω(h−j+k) dω
−π j=0 k=0
Z π ∞ X
X ∞
= sx (ω) ψj e−iωj ψ̄k eiωk eiωh dω
−π j=0 k=0
Z π X∞ ∞
X
= sx (ω) ψj e−iωj ψ̄j eiωj eiωh dω
−π j=0 k=0

By rearranging the terms inside the integral, we obtain:

2
Z π ∞
X
γy (h) = ψj e−iωj sx (ω)eiωh dω (11)
−π j=0

By using the Fourier inverse definition for the Spectral density, it must be that:
Z π
γy (h) = sy (ω)eiωh dω (12)
−π

Then, the spectral density must be:

2

X
sy (ω) = ψj e−iωj sx (ω) , ω ∈ [−π, π] (13)
j=0

74
2
sy (ω) = ψ(e−iωj ) sx (ω) , ω ∈ [−π, π] (14)

Recall |e−iωj | = 1. Thus, we are evaluating ψ(z) on the unit circle: |z| = 1
Case 1: Stationary AR(1)
• Consider

ϵt = (1 − ρL)yt

• We can express the above as:

ϵt = h(L)yt

with h(L) = 1 − ρL
• Then, we immediately have a relationship between the spectral densities:

sε (ω) = h e−iω h eiω sy (ω)


 

sε (ω) = (1 − ρe−iω )(1 − ρeiω )sy (ω)

sε (ω) = (1 − ρeiω − ρe−iω + ρe−iω ρeiω )sy (ω)

sε (ω) = (1 − ρeiω − ρe−iω + ρe−iω+iω )sy (ω)

sε (ω) = (1 − ρeiω − ρe−iω + ρe0 )sy (ω)

sε (ω) = (1 − ρeiω − ρe−iω + ρ)sy (ω)

sε (ω) = (1 − ρ(eiω + e−iω ) + ρ)sy (ω)

• By using Euler’s formula:

sε (ω) = (1 − ρ(cos ω − i sin ω + cos ω + i sin ω) + ρ)sy (ω)

sε (ω) = (1 − 2ρ cos ω + ρ)sy (ω)

75
• Given that εt is a white noise with variance σ 2 :

σ 2 = (1 − 2ρ cos ω + ρ)sy (ω)

Thus, can we just simply write:


 
1
sy (ω) = σ2
1 − 2ρ cos(ω) + ρ2
? No. In short, the problem is that we cannot simply put (1 − 2ρ cos ω + ρ)−1 on the RHS because it
could be the case that that polynomial does not converge
• Why do we need the process to be stationary?: Well, we are almost done in finding the spectral
density of yt . However, we need to realize that we have:

σ 2 = h e−iω h eiω sy (ω)


 

, in which the lag polynomial h(L) is being evaluated at L = e−iω and L = eiω . Given that
|e−iω | = |eiω | = 1, we are analyzing the the lag polynomial on the unit circle. From Section 2.3,
we know that h(L)−1 = (1 − ρL)−1 when evaluated on the unit circle will converge only if |ρ| < 1,
which is the same condition for AR(1) to be stationary.
• Thus, the Spectral density of a stationary AR(1) process is given by:
 
1
sy (ω) = σ2
1 − 2ρ cos(ω) + ρ2

Case 2: AR(m)
• Consider

ϵt = (1 − ρ1 L − ρ2 L2 − · · · − ρm Lm )yt

, in which εt is a white noise with variance σ 2


• Let λi be the roots of the polynomial:

π̃(z) = z m − ρ1 z m−1 − ρ2 z m−2 − · · · − ρm

Then, by using the Fundamental Theorem of Algebra we can write:

(1 − ρ1 L − ρ2 L2 − · · · − ρm Lm ) = (1 − λ1 L)(1 − λ2 L) . . . (1 − λm L)

• Thus, we can rewrite the process as:

ϵt = (1 − λ1 L)(1 − λ2 L) . . . (1 − λm L)yt

• Then, we immediately have a relationship between the spectral densities:

sϵ (ω) = (1 − λ1 e−iω )(1 − λ1 eiω )(1 − λ2 e−iω )(1 − λ2 eiω ) . . . (1 − λm e−iω )(1 − λm eiω )sy (ω)

76
• Given that the process is stationary, we can put on the LHS all the inverted lag polynomials evaluated
on the unit circle and by using the Euler’s formula as in the previous case we have that the spectral
density of an AR(m) process is given by:

m
Y 1
sy (ω) = σ 2
j=1
1 − 2λj cos(ω) + λ2j

7 AR(P)
In this chapter we will use all the above results to analyze the properties of a time series AR(p)

7.1 The lag polynomial


• Consider the classic AR(p) process for Xt :

Xt = ρ1 Xt−1 + ρ2 Xt−2 + · · · + ρp Xt−p + εt (15)

, in which εt is i.i.d. with mean zero and variance 1


• Thus, we have:
(1 − ρ1 L − ρ2 L2 − · · · − ρp LP )Xt = εt

• Let’s define the lag polynomial:

π(L) = 1 − ρ1 L − ρ2 L2 − · · · − ρp LP

• Thus, we can express (15) as:


π(L)Xt = εt (16)

• Now, we would like to analyze the properties of Xt


• Thus, we would like to solve the difference equation for Xt
• To solve means to express Xt a function only of ε and initial values of Xt (i.e. X0 )

7.2 Solving the difference equation


• One way to solve it from looking at (15) is to use the Iterative method, but that can be really tedious
as P gets larger.
• One straightforward way to do it, from looking (16), is to premultiply π(L)−1 in (16).
• We would like π(L)−1 to be:
– Defined: in the sense that it exists. Thus, we avoid the values of L such that π(L) = 0
– Regular/ Analytic / Holomorphic: if it is regular around zero, the function can be expanded
in a Taylor’s Series that converges in the largest open disk (i.e., around zero since it is the
largest distance it can be made given the formula in Taylor’s Theorem) that does not contain any
singularity so that we can express π(L)−1 as a infinite sum that converges to something finite
(i.e., a power series that converges to that analytic function)
• So, it is super clear why we need the function π(L)−1 to be defined. Otherwise, it does not exists (i.e.,
1/0).
• Being well defined might seem to be enough, but it is not!

77
• If π(L)−1 exists, we can express Xt as function of εt so we solved the difference equation for Xt .
However, the solution implies a nonlinear function: π(L)−1 .
• Recall that we are interested in finding the properties of Xt (e.g., is it stationary?) and it is impossible
to if we want to analyze:

π(L)−1 εt

For instance, L does not mean anything by its own, it needs to be applied to a time series. Can I apply
it to εt using the expression above? . . . clearly NO! not even for the AR(1) case:

1
εt
1 − ρ1 L

• I know how to deal with L when it is all by itself (to any power) multiplying a time series. Then, the
question is can I express π(L)−1 in such a way that I can get that?
• Luckily for us the answer is YES!. We can take advantage of Taylor’s Theorem so we can express
π(L)−1 as a power series that converges in the largest open disk to π(L)−1 .
• For doing that, we need to check if the π(L)−1 is a regular function.
• If so, we can express π(L)−1 as an infinite sum that converges and we are able to analyze the features
of Xt since we will be able to apply L to εt .

7.3 Is the inverse of the lag polynomial well defined?


7.3.1 The roots of the characteristic polynomial
• If π(L) = 0, we have that π(L)−1 = ∞ so it is not well defined and π(L)−1 cannot be used
• Thus, I need to state that L is not taking the values of the roots in the polynomial π(L). Otherwise I
cannot invert it
• I know that L is the lag operator but when analyzing his properties inside a function we treat it as a
variable, taking the most general case L ∈ C
• We want to know when do we get π(L) = 0
• Finding the values for L that makes π(L) = 0 it is called finding the roots
• The roots of a polynomial are the values of the variable in the polynomial that makes the polynomial
equal to 0
• Thus, we are interested in finding the roots of this polynomial
• To find the roots, we just need to find the roots in the characteristic function of π(z)
• Let’s denote the roots of the characteristic function π(z) as zi ∀i = 1, 2, 3 . . . , p.
• Once the roots are found, I just have to write that for any z ∈ C\ {z1 , z2 , z3 , . . . , zp }, the characteristic
function π(z)−1 is well defined.

7.3.2 An alternative characteristic polynomial: The reflected polynomial


The roots in the characteristic polynomial π(z) are equal to the inverse of the roots in its reflected polynomialπ̃(z).

π̃(z) = z p π(z −1 ) (17)

78
• π̃(z) is known as the Reflected polynomial of π(z). Therefore, π(z) is also the Reflected polyno-
mial of π̃(z):
π(z) = z p π̃(z −1 ) (18)
Example:
π(z) = 1 − ρ1 z − ρ2 z 2

π(z −1 ) = 1 − ρ1 z −1 − ρ2 z −2

π̃(z 1 ) = z 2 (1 − ρ1 z −1 − ρ2 z −2 )

Thus, (17) holds:

π̃(z 1 ) = z 2 − ρ1 z 1 − ρ2

Now,

π̃(z −1 ) = z −2 − ρ1 z −1 − ρ2

π(z) = z 2 (z −2 − ρ1 z −1 − ρ2 )

Thus, (18) holds:

π(z) = 1 − ρ1 z − ρ2 z 2
Uhlig actually follows the same strategy to define the characteristic polynomial:
π̃(z) ≡ P (λ)

• Let’s denote λi as the roots in the characteristic polynomial π̃(z)


• Given (17), we know that the characteristic polynomials are related. So, are the roots also related?
˜ we are finding the values of z such that:
• Indeed the roots are related. When finding the roots for π(z),

π̃(z) = 0

From (17), it is the same as finding the values of z such that:

z p π(z −1 ) = 0

Since those values cannot be zero, then we are finding the values of z such that:

π(z −1 ) = 0

Let’s denote zi as the roots of the characteristic polynomial, π(z). Given this notation, the roots of
π(z −1 ) should be zi−1 since π(z −1 ) is π(z) using the inverse variable. Therefore, we have that:

λi = zi−1 (19)

79
7.4 Is the function a regular one?
• Notice that the lag polynomial is also a function.
• Checking if a complex function is regular follows Section 2.
• However, there is an issue. It gets more complicated to analyze π(z)−1 when p is greater than 1.
• Then, is there any trick to deal with it?
• The answer is YES! We will take advantage of the roots we have found for establishing when π(z)−1
is well defined by applying the Fundamental Theorem of Algebra

7.4.1 Inverting π(L)


• Notice that we can always use the roots to rewrite any characteristic equation (Fundamental The-
orem of Algebra: Finding roots of a polynomial is therefore equivalent to polynomial factorization
into factors of degree 1.) e.g.:
P (x) = x3 − 2x2 − x + 2
, in this case the three roots of the polynomial P (x) are calculated such that

P (z) = 0

so, we have that the roots are: z1 = 2, z2 = 1 and z3 = −1. Thus, we can rewrite the polynomial as
follows:

x3 − 2x2 − x + 2 = (x − z1 )(x − z2 )(x − z3 )

x3 − 2x2 − x + 2 = (x − 2)(x − 1)(x + 1)

P (x) = (x − 2)(x − 1)(x + 1)

• So, for the reflected characteristic polynomial P̃ (x), we have:


1
P̃ (x) = (x − )(x − 1)(x + 1)
2

• Further, taking into account that P (x) is also the reflected polynomial of P̃ (x) :

P (x) = x3 P̃ x−1


so,
1
P (x) = x3 (x−1 − )(x−1 − 1)(x−1 + 1)
2
1
P (x) = x(x−1 − )x(x−1 − 1)x(x−1 + 1)
2
We can also express P (x) in terms of the roots of its reflected characteristic polynomial as:
 
1
P (x) = 1 − x (1 − x)(1 + x)
2

For further on this please refer to section 1.7.

80
• Going back to the general case, we know that λi are the roots of the characteristic polynomial π̃(z).
Thus, λi are the values of z such that π̃(z) = 0
• We know that λi = zi−1 and zi are the roots of the characteristic polynomial π(z)
• Thus, we can also write:
     
1 1 1 1
π(z) = 1− z 1− z 1 − z ... 1 − z (20)
z1 z2 z3 zp

• Therefore, we can express π(z)−1 in a nicer way:


     
−1 1 1 1 1
π(z) = ... (21)
(1 − λ1 z) (1 − λ2 z) (1 − λ3 z) (1 − λp z)

• This is a nicer way because to know if π(z)−1 is a regular function, we just have to check if each (1−λ
1
i z)
is a regular function: If two functions f (z) and g(z) are analytic in a domain D, then their sum and
their product are both analytic in D. For further on this, check Section 1.10
• Based on our results in Section 2.4, we conclude that the characteristic function π(z)−1 is well defined
and analytic for all z ∈ C\L, in which:
 
1
L= z ∈ C : Re(z) ≥ and Im(z) = 0
λi
or

L = {z ∈ C : Re(z) ≥ zi and Im(z) = 0}

• Thus, from Taylor’s Theorem we have that π(z)−1 can be expressed as a power series with z0 ∈ C\L:


X f (n) (z0 ) n
f (z) = (z − z0 )
n=0
n!

where the series converges on any disk |z − z0 | < r contained in C\L. Since 0 ∈ C\L, we select z0 = 0
(largest open disk). Thus, the series converges on the largest open disk only if:

|λi z| < 1

• We know that we can find the values for the roots λi . However, a natural question that emerges at
this point is what is z doing in there. Well, remember that z is just the argument in the characteristic
function and it is originally the variable L. But it doesn’t help too much to know that.I mean what is
the intuition of choosing a value for L. It was supposed to be a trick just to get along with time series.
We know for sure that we need z such that:

z ∈ C\L

and

|λi z| < 1

A set of values for z that satisfies both conditions are those such that |z| = 1. That is, when π(z)−1
is evaluated on the unit circle. We will pick up this because we need it for the Spectral density.

81
7.4.2 Is the characteristic polynomial π(z)−1 analytic on the unit circle, |z| = 1?
• So, we have that π(z)−1 can be expressed as an infinite sum as long as the value of z is such that:

z ∈ C\L

and

|λi z| < 1

• We said that we usually pick up |z| = 1, but why?


• To better understand this, recall that we are interested in analyzing the properties of Xt
• Let’s assume that π(z)−1 is a regular function that converges on the largest open disk. That is, we
have z and λi satisfying the conditions listed above.
• If so, we have for AR(1):


X
−1
π(z) = (λ1 z)i
i=0

Notice that for AR(1), λ1 = ρ. Then, by using the inverse lag polynomial. That using L instead of z,
we have:


X
−1
π(L) = (ρL)i
i=0

Replacing it on the DGP for Xt :


X
Xt = (ρL)i εt
i=0

By using applying the lag operator to εt :


X
Xt = ρi εt−i
i=0

• Then, to know the properties of Xt , we need to study the properties of:


X
ρi εt−i
i=0

As this is an infinite sum, we would like to know if this converges or not. Be careful! Having π(z)−1
as a regular function is just to be able to express the function as an infinite sum. However, after
replacing the infinite sum for z = L in the DGP of Xt , we have an infinite sum that involves εt . So, to
fully analyze Xt , we need to analyze the infinite sum involving εt .
• Given that this infinite sum involves a random variable, εt , we know that we are talking about conver-
gence in probability and converges in q − th moment. (See Section 4).

82
• Thus we would like to be able somehow to apply Theorem 3
• We said that we needed to assume z and λi to satisfy the conditions for π(z)−1 to be regular. A set of
values that satisfies both requirements is given by:

|z| = 1

|λi | < 1

• If so, we have that for the AR(1), we have:


X
ρi < ∞
i=0

Given that |λ1 | < 1, we also have that


X
|ρi | < ∞
i=0

Lets’ denote:

ψj = ρj

Thus, we have one of the two conditions of Theorem 3.


X
|ψj | < ∞
j=0

The other condition is satisfied because we assumed εt is i.i.d. with zero mean and variance 1. Thus,
it is a particular case of the one in Theorem 3.
• From Theorem 3, we immediately know that converges in the 1st moment and therefore by Markov
inequality it converges in probability.
• Further, from Theorem 4, we immediately know that it converges in the second moment as well.
• Thus, we know that Xt has a finite second moment, which implies a finite first moment.That is, it has
a finite variance and a finite mean.
• In short,
1. We need to be able to express π(L)−1 as an infinite sum to analyze Xt . Otherwise we simply
cannot do it.
2. Thus we need z and |λi | to be such that:

z ∈ C\L

and

|λi z| < 1

83
3. Assuming those, it is not enough given that now we are dealing with an infinite sum that involve
εt
4. If |λ1 | < 1, and given the properties of εt , the conditions in Theorem 3 and Theorem 4 are satisfied
and we conclude that Xt has a finite variance
• However, we would be using the same tools and therefore landing to the same conclusions if we just
proceed to analyze if the characteristic function π(z)−1 is holomorphic on the unit circle and
converges on the largest open disk.
• Thus, to know is Xt converges on the second moment, we just need to check if the roots, λi , in the
characteristic polynomial π̃(z) are such that:

|λi | < 1

7.5 Is the process stationary?


• From the Theorem 6, we have that Xt is stationary if:
1.

X
Xt = ψj εt−j
j=0

2.

X
|ψj | < ∞
j=0

3.
εt is stationary

• The first requirement is to be able to express Xt as an infinite sum, which can be achieved if |λi | < 1
• The second requirement is immediately satisfied if |λi | < 1 given that ψj is just a multiplication of λi
(see Section 7.4.1)
• The third requirement is satisfied by definition of εt
• Therefore, we yield to the same conditions as in the previous section: For Xt to be stationary we only
need to check if the roots, λi , in the characteristic polynomial π̃(z) are such that:

|λi | < 1

, which is the same as saying:

λi ∈ the open unit disk

7.6 The Spectral Density


• If the process is stationary, we can find the spectral density of Xt by applying Theorem 8.
• Now, it is crystal clear why we care about the case when the lag polynomial is on the unit circle.

84
7.7 Impulse response Function
• Consider the mean zero (de-meaned) weakly stationary AR(1) model.

yt = β1 yt−1 + εt

• We might want to know what we should expect the future value of yt+k to look like if yt were one unit
larger holding all yt−j j > 0 fixed.
• This is equivalent to asking how we should expect yt+k to change given a one unit change in εt .
• WHY?: From the DGP of Yt if we hold yt−1 fixed, the only way to increase yt is through increases in
εt
• The impulse response function is the path that y follows if it is kicked by a single unit shock ϵt , i.e.,
ϵt−j = 0, ϵt = 1, ϵt+j = 0.
• This function is interesting since it allows us to start thinking about ”causes” and ”effects”
• For example, you might compute the response of GDP to a shock to GDP and interpret the result as
the ”effect” on GDP of a shock over time.
• The M A(∞) representation is the same thing as the impulse response function.
• Thus, for a stationary AR(p) process, we know that its stationary solution is given by:

X
yt = ψj εt−j
j=0

This is also known as the M A(∞) representation or the impulse-response function. From this repre-
sentation we can calculate the effect of a single shock overtime.

For instance, the impact of a one unit one unit increase in εt−k on yt is given by:

dyt
= ψk
dεt−k

• When we plot the change in yt given a one unit shock to εt−k we call this an impulse response
function (IRF).
• A stationary process has the property that the effect of a shock does not last forever. That is, if a
shock occurs at t, eventually (as t goes to infinity) ψ∞ will be zero:

dyt+k
= ψk → 0 as k → ∞
dεt

• We can clearly see that with the stationary AR(1) case, in which:

ψj = ρj

and as it is stationary we have |ρ| < 1. So:

ρj → 0 as j → ∞

• Figure 7.1 represents the IRF for a stationary AR(1) process.

85
• Interpretation? If there is an unexpected increase of one additional dollar traded, how much does
dollar volume change k-periods ahead?

Figure 24: IRF for a stationary AR(1) process

• Having an IRF vanishing over time is a consequence of being a stationary process. However being
stationary is a sufficient condition but not a necessary one. That is, we can have a process with IRF
vanishing over time that are nonstationary.

8 Testing for the presence of Unit Root


8.1 Motivation
• As an example, let’s focus on the AR(1) case.
• We know that if |ρ| = 1, yt is nonstationary.
• |ρ| = 1 is when the root is on the unit disk (i.e., we have a unit root)
• Thus, we would like to test the hypothesis if the process is stationary or not.
• That is, we would like to know if we have a unit root.
• We would like to apply the usual t-test or Wald test. Can we?
• Sadly, we cannot.
• Consider the simple AR(1) model

yt = ϕyt−1 + εt , where εt ∼ W N 0, σ 2


The hypotheses of interest are

H0 : ϕ = 1( unit root in ϕ(z) = 0) ⇒ yt ∼ I(1)


H1 : |ϕ| < 1 ⇒ yt ∼ I(0)

The test statistic is


ϕ̂ − 1
tϕ=1 =
SE(ϕ̂)

where ϕ̂ is the estimator of ϕ and SE(ϕ̂) is the usual standard error estimate.
The test is a one-sided left tail test. If {yt } is stationary (i.e., |ϕ| < 1) then it can be shown

86
√ d
T (ϕ̂ − ϕ) → N 0, 1 − ϕ2


or  
A 1
1 − ϕ2

ϕ̂ ∼ N ϕ,
T
A
and it follows that tϕ=1 ∼ N (0, 1). However, under the null hypothesis of nonstationarity the above
result gives
A
ϕ̂ ∼ N (1, 0)
which clearly does not make any sense.

8.2 The most common unit root tests


• Testing for the presence of a unit root in a time series is now a common starting point that is available
as an option in several popular statistical packages.
• For some surveys, see Stock (1994) and Haldrup and Jansson (2006).
• Among the most used statistical tests is the Augmented DickeyFuller (ADF ) proposed by Said and
Dickey (1984) based on Dickey and Fuller (1979).
• Another set of statistics is the family of M tests that was originally proposed by Stock (1999) and
further analyzed by Perron and Ng (1996).
• The M tests are composed of three statistics: M Zαb , M SB, M Ztαb .
• Elliott, Rothenberg and Stock (1996) proposed a feasible point optimal test PTGLS and an ADF GLS


test, both of which are constructed using GLS detrended data in order to increase the power perfor-
mance of the tests.
• Ng and Perron (2001) used the same strategy applied to the family of M tests, as well to a feasible
optimal point test denominated M PTGLS .

8.3 The Dickey-Fuller (DF) Statistic


• Based on Dickey and Fuller (1979)
• Let’s go back to our beloved AR(1) case:

yt = ρyt−1 + ut

where yt is the variable of interest, t is the time index, ρ is a coefficient, and ut is the error term
(assumed to be white noise).
• A unit root is present if ρ = 1. The model would be non-stationary in this case.
• We can rewrite the model such that:

∆yt = (ρ − 1)yt−1 + ut = δyt−1 + ut

where ∆ is the first difference operator and δ ≡ ρ − 1.


• This model can be estimated, and testing for a unit root is equivalent to testing δ = 0.
• It is not possible to use standard t-distribution to provide critical values because under the null, yt−1
is not stationary and therefore CLT fails.
• Under the null this statistic t has a specific distribution simply known as the Dickey-Fuller distribution.

87
• Therefore to reject the null we just have to compare the the t statistic againts the critical values of the
Dickey-Fuller distribution.
• It is worth to point out that the critical values depend on the deterministic component of the DGP.
• There are three main versions of the test:
1. Test for a unit root:
∆yt = δyt−1 + ut

2. Test for a unit root with constant:

∆yt = a0 + δyt−1 + ut

3. Test for a unit root with constant and deterministic time trend:

∆yt = a0 + a1 t + δyt−1 + ut

8.4 The Augmented Dickey Fuller - ADF


• Based on Said and Dickey (1984)
• The DF test assumes that ut is an i.i.d. process and that the DGP for yt is an AR(1) process
• However, in general it might be the case that the DGP process for yt is AR(P) and that the ut is serial
correlated.
• If so, we should use the ADF equation

∆yt = a0 + a1 t + δyt−1 + β1 ∆yt−1 + · · · + βk ∆yt−k + εt (22)

• For testing about the unit root, we use the same critical values from the DF distribution.
• The t distribution can be used for testing about βi for i = 1, 1, 2, . . . , k
• However, a question remains....what is the value of k?
• Using Monte Carlo experiments, Schwert (1989) showed that the value of k has important implications
for the size and power of the ADF test, in particular when there is strong negative moving average
correlation in the residuals.
• Ng and Perron (1995) constitutes the first study dealing with the analysis of lag-length selection using
different criteria.
• They prove that the choice of the data-dependent rule has a bearing on the size and power of the test.
• Moreover, they show that information-based rules such as the Akaike information criteria (AIC) and
Bayesian information criteria (BIC) tend to select values of k that are consistently smaller than those
chosen through sequential testing for the significance of coefficients on additional lags (t-sig method),
and that the size distortions associated with the former methods are correspondingly larger.

8.5 The Phillips Perron Test


• The DF and ADF forces us to choose the optimal number of lags of the dependent variable or the
forecast variable in question.
• Moreover, the functional form of the model must also be specified. Do we include a trend, is there an
intercept, do we use both a time trend and an intercept?

88
• The Dickey Fuller and the Augmented Dickey fuller have been found to have low power (The power of
a test is the probability rejecting the null hypothesis when is false) in some circumstances.
– Consider a model where in ϕ = 0.95.
– By all accounts, it meets our criteria for a stationary process but the result of the test may indicate
non-stationarity especially in data with a low sample size.
• Based on Phillips and Perron (1988)
• The P P Test corrects for any serial correlation and heteroscedasticity in the errors by some direct
modification to the test statistics.
• This modification is a nonparametric one.
• The P P has no need to specify the lag length.

8.6 The ADF − GLS test


• In their 1996 Econometrica article, Elliott, Rothenberg and Stock (ERS) proposed an efficient test,
modifying the Dickey-Fuller test statistic using a generalized least squares (GLS) rationale.
• They demonstrate that this modified test has the best overall performance in terms of small-sample
size and power, conclusively dominating the ordinary Dickey-Fuller test.
• In particular, Elliott et al. find that their ”DF-GLS” test ”has substantially improved power when an
unknown mean or trend is present.” (1996, p.813)
• Let zt = (1, t).
• For the time series yt , regress [y1 , (1 − αL)y2 , . . . , (1 − αL)yT ] on [z1 , (1 − αL)z2 , . . . , (1 − αL)zT ] yield-
ing
β̃GLS
where α = 1 + c̄/T , and c̄ = −13.5 for the detrended statistic.
• Detrended ỹt = yt − zt′ β̃GLS is then employed in the ADF equation, with no intercept nor time trend.
• The t-statistic on ỹt−1 is the DF-GLS statistic.
• For the demeaned case, the t is omitted from zt , and c̄ = −7.0.
• To reject the null we compare the t statistic with the critical values of the DF − GLS distribution that
depends on the deterministic component.
• ERS (1996) show that the choice of k has a considerable effect on the size (size of a test is the probability
of incorrectly rejecting the null hypothesis if it is true) of ADF GLS .
• In order to select k, they try AIC, BIC and sequential likelihood ratio statistics. Finally, they use the
BIC to select k by setting the lower bound at 3 , because with zero as the lower bound larger size
distortions result.
• Ng and Perron (2001) consider a class of Modified Information Criteria (M IC) with a penalty factor
that is sample dependent.
• In the Monte-Carlo experiments, they find that the M IC yields huge size improvements on the
ADF GLS .
• They also show that both the use of the M IC and allowing for GLS data detrending in the M test
results in a class of M GLS tests that have desirable size and power properties.
• In conclusion, the M IC (in particular the M AIC version) is a superior rule for selecting lag length.

89
8.7 The M statistics
• Perron and NG (1996) showed that for a AR(1) and M A(1) with negative coefficient close to −1, the
P P test exhibits strong size distortions (it did not reject the null when the null was false).
• Thus, they proposed the M tests that was originally proposed by Stock (1999).
• The M statistics are composed of three statistics: M Zαb , M SB, M Ztαb .
• They performed better than the P P statistic.
• While the power gains of the DF GLS from using GLS detrended data are impressive, simulations also
show that the test exhibits strong size distortions when dealing with an M A(1) process with a negative
coefficient.
• Since the power gains from the DF GLS over the DF come from the use of GLS detrended data, it is
natural to consider the M tests under GLS detrending.
• Ng and Perron (2001) analasymptotic properties of the M GLS tests.
• Ng and Perron (2001) extend the M tests developed in Perron and Ng (1996) to allow for GLS
detrending of the data.
• They also show that both the use of the M IC and allowing for GLS data detrending in the M test
results in a class of M GLS tests that have desirable size and power properties.
• In conclusion, the M IC (in particular the M AIC version) is a superior rule for selecting lag length.

9 VAR (p)
9.1 Motivation
• So far we have focused mostly on models where y depends on past observations of y.
• More generally we might want to consider models for more than one variable.
• If we only care about forecasting one series but want to use information from another series we can
estimate an ARMA model and include additional explanatory variables.
• For example if yt is the series of interest, but we think xt might be useful we can estimate models like

yt = β0 + β1 yt−1 + γxt−1 + εt

• This model can be fit by least squares. Our dependent variable is yt and the independent variables are
yt−1 and xt−1
• Once the model is fit, the one-step ahead forecast is given by:

E (yt+1 | Ft ) = β0 + β1 E (yt | Ft ) + γE (xt | Ft ) = β0 + β1 yt + γxt

• Just like the simple AR model, the one step ahead forecast variance is σε2 .
• A joint model for xt and yt is required if we are interested in multiple step ahead forecasts, or if we
are interested in feedback effects from one process to the other.
• For example, if we want to 2 step ahead forecaste for yt , we are looking for

E (yt+2 | Ft ) = β0 + β1 E (yt+1 | Ft ) + γE (xt+1 | Ft )

90
Then, the obvious question is what do we use for:

E (xt+1 | Ft )
since xt+1 is not known at t
• Answer: We need a model for x as well
• Before proceeding to the next section, it is important to review Appendix A.

9.2 Companion Form Representation of an AR(p) Model


• Can we convert an AR(p) back into an “AR(1)” type model?
• Consider the AR(p) model:
p
X
yt = βj yt−j + εt
j=1

• Next we define the new vectors and matrix:


 
    β1 β2 ··· βp−1 βp
yt εt  1 0 ··· 0 0 
 yt−1   0   
ξt = 

..

 vt = 

..

 F =
 0 1 0 

 .   .   .. .. .. .. 
 . . . . 
yt−p+1 0
p×1 p×1 0 ··· 0 1 0 p×p

• Then we can write the AR(p) model as the following first order model:

ξt = F ξt−1 + vt (23)

• This is known as the Stacked VAR(1) or the companion form representation

9.2.1 Stationarity of the Stacked VAR(1)


• From section 7.5, we know that the AR(p) is stationary if the roots of the characteristic polynomial,
λi , :

π̃(z) = z p π z −1


are inside the unit circle:

|λi | < 1

or if the roots of the characteristic polynomial, zi , :

π(z)

are outside the unit circle:

|zi | > 1

91
• From the stacked form, we have that F is an square matrix and thus, we could find its eigenvalues:

|F − λI| = 0

, which is the same as |λI − F | = 0


• We know that |F − λI| is a polynomial of degree p. Given the composition of F , we have that:

|F − λI| = π̃(z)

, so to know if the stacked VAR(1) is stationary is the same as knowing if the eigenvalues of F are
inside the unit circle:

λi ∈ the open unit disk

9.3 The VAR(1) model with 2 variables


• Suppose that we have 2 variables that we observe at time period t and we consider the joint model:

xt = β0x + β1x xt−1 + β2x yt−1 + uxt


(24)
yt = β0y + β1y xt−1 + β2y yt−1 + uyt

• Each equation is like an AR(1) model with one other explanatory variable.
• Each equation depends on its own lag and the lag of the other variable.
• We also now have two errors, one for each equation: uxt and uyt
• Since x depends on y and y depends on x, a more thorough understanding of dynamics and forecasting
requires us to jointly consider x and y in the system of equations.

9.3.1 Matrix Notation


• We can write the model in matrix notation:
  x   x
β2x uxt
    
xt β0 β1 xt−1
= + + (25)
yt β0y β1y β2y yt−1 uyt

• By defining the following vectors and matrices we end up with a very simple form for the VAR(1)
model. Let:

uxt β0x β1x β2x


       
xt
yt = , vt = , β0 = , and β 1 =
yt uyt β0y β1y β2y

Then, the model in matrix from is given by:

yt = β 0 + β 1 yt−1 + vt (26)

92
9.3.2 Assumptions on the errors
• The errors are white noise and uncorrelated with lags of the other errors.
• uxt is uncorrelated with uxt−j and uyt−j for j ̸= 0.
• uyt is uncorrelated with uxt−j and uyt−j for j ̸= 0.
• However there may be contemporaneously correlated. If so, we call them reduced shocks since they
come from a more compelling model that allows for a contemporaneous relationship between xt and
yt . It might be the case that the system is such that there is contemporaneous relationship:

xt = β0x + αx yt + β1x xt−1 + β2x yt−1 + εxt


(27)
yt = β0y + αy xt + β1y xt−1 + β2y yt−1 + εyt

Let’s, define:

αx
 
1
α=
αy 1
x
 
εt
εt =
εyt

Thus,

yt = α−1 β 0 + α−1 β 1 yt−1 + α−1 εt (28)

so,

vt = α−1 εt

If we assume that εt are structural shocks (i.e. the very first and purest shocks in the economy that
are not related to each other for any t), we have the the reduced shocks are such that there are
contemporaneously related:

σux uy = E[uxt uyt ] − E[uxt ]E[uyt ] = E[(εxt + αx εyt )(αy εxt + εyt )] = αx σε2x + αy σε2y ̸= 0

• Ω denotes the variance covariance matrix of the reduced error:

σu2 x
 
σux uy
Ω=
σux uy σu2 y

9.3.3 Solving the system of equations: Inverting the matrix


• First, it is worth to point out the it can also be solved by iterating.
• W.O.L.O.G. let’s assume the intercept is zero
• Now we have:
yt = β 1 yt−1 + vt

93
• Let’s use the lag operator:
yt = β 1 Lyt + vt

(I − β 1 L)yt = vt

• So, we would like to be able to express the solution as:

yt = (I − β 1 L)−1 vt

• To do so, first we need to check if the inverse exist. That is, if (I − β 1 L) is a nonsingular matrix. So,
we need to check if:
1. (I − β 1 L) is a square matrix
2. |(I − β 1 L)| =
̸ 0 so we have:

1 ′
(I − β 1 L)−1 = C
|(I − β 1 L)|

, in which C is the transpose of the cofactor matrix
Condition 1 is satisfied given that β 1 is an square matrix.
• For condition 2, notice that |(I − β 1 L)| is a second order polynomial on L:

1 − β1x L β2x L
 
|(I − β 1 L)| =
β1y L 1 − β2y L

|(I − β 1 L)| = (1 − β1x L)(1 − β2y L) − β2x Lβ1y L

|(I − β 1 L)| = 1 − β2y L − β1x L + β1x Lβ2y L − β2x β1y L2

|(I − β 1 L)| = 1 − (β2y + β1x )L − (β2x β1y − β1x β2y )L2

Thus, we have that:

|(I − β 1 L)| = π(L) = 1 − π1 L − π2 L2 (29)

• Thus, condition 2 is the same as checking that π(L)−1 exists.


• Further, we are in a similar situation as is the usual AR(p) case. That is, we would like:
1. π(L)−1 < ∞
2. (1 − π1 L − π2 L2 )−1 has no meaning in Time Series (i.e. How do I multiply it to the error term?).
So, we would like π(z)−1 to be a regular function so we can use Taylor’s Theorem around 0 and
express it as an infinite sum.
3. Check if the power series converges when |z| = 1 (So we know that the infinite sum time the error
term (i.e. the RHS) converges in the second moment)

94
• We know that we can find the roots of the characteristic polynomial π(z), zi , and express it as:
  
1 1
π(z) = 1− z 1− z
z1 z2

• Further, we know that, the characteristic function (1 − ρz)−1 is regular (analytic) for all z ∈ C\L, in
which

L = {z ∈ C : Re(z) ≥ zi and Im(z) = 0}

Thus, from Taylor’s Theorem we have that π(z)−1 can be expressed as a power series with z0 ∈ C\L.
• The Taylor series expansion around zero is valid for |zi −1 z| < 1
• For |z| = 1, the Taylor series expansion around zero is valid if

|zi | > 1

Otherwise, the infinite sum does not converge to the function (1 − ρz)−1
• Thus, we need the roots of the characteristic polynomial |I − β 1 z| to be outside the unit circle.
• If so, we have that the solution for the system is given by:

−1
yt = (I − β 1 L) vt

1
yt = C ′ vt
|(I − β1 L)|

1
yt = C ′ vt
(1 − π1 L − π2 L2 )

  −1
1 1
yt = 1− L 1− L C ′ vt
z1 z2

 −1  −1
1 1
yt = 1 − L 1− L C ′ vt
z1 z2

∞  j ∞  j
X 1 X 1
yt = Lj Lj C ′ vt
j=0
z1 j=0
z2


X
yt = ψj Lj C ′ vt
j=0


1 − β2y L −β2x L
X  
j
yt = ψj L vt
−β1y L 1 − β1x L
j=0

95
∞  
X c1j c2j
yt = Lj vt
c3j c4j
j=0


X
yt = Cj Lj vt
j=0

• So, we have an infinite MA representation for each variable with two errors. Using Theorem 6, we know
that the solution does not only converges in the second moment but it is also a stationary solution.
• Therefore, if the roots of the characteristic polynomial |I−β 1 z| are outside the unit circle, the stationary
solution of the system is given by:


X
yt = Cj Lj vt (30)
j=0

9.3.4 The roots of the characteristic polynomial and the eigenvalues


• Notice that the characteristic polynomial |Iz − β 1 | is the reflected polynomial of |I − β 1 z|:

|I − β 1 z| = π(z)

|Iz − β 1 | = π̃(z)

π̃(z) = z p π z −1


• Thus, the roots of π̃(z) are the inverse of the roots of the reflected polynomial π̃(z):

1
= λi
zi

• Recall that the roots of the (reflected) characteristic polynomial |Iλ − β 1 | are the eigenvalues of the
matrix β 1 .
• Thus, we have that talking about the roots of the characteristic polynomial |I − β 1 z| is the same as
talking about the inverse of the eigenvalues of β 1 .

9.3.5 Stationary Solution


• If the roots of the characteristic polynomial |I − β 1 z| are outside the unit circle:

|zi | > 1

• or if the eigenvalues of β 1 are inside the unit circle (i.e. the roots of the (reflected) characteristic
polynomial |Iλ − β 1 |):

|λi | < 1

96
Then, the VAR(1) has the following stationary solution:

X
yt = Cj Lj vt (31)
j=0

9.4 VAR(1) with n variables


• The requirement for the stationary solution is the same as in section 9.3
• The difference is the number of roots
• The characteristic polynomial is of degree n + 1
• Thus, we have n + 1 roots
• In general, the number of roots depends on the number of variables and on the lag length.

9.5 The VAR(2) model


xt = β1x xt−1 + β2x yt−1 + β3x xt−2 + β4x yt−2 + uxt
(32)
yt = β1y xt−1 + β2y yt−1 + β3y xt−2 + β4y yt−2 + uyt

9.5.1 Matrix Notation


• We can write the model in matrix notation:
  x
β1 β2x
  x
β4x uxt
     
xt xt−1 β3 xt−2
= + + (33)
yt β1y β2y yt−1 β3y β4y yt−2 uyt

• By defining the following vectors and matrices we end up with a very simple form for the VAR(2)
model. Let:

uxt β1x β2x β3x β4x


       
xt
yt = , vt = , β1 = , and β 2 =
yt uyt β1y β2y β3y β4y

Then, the model in matrix from is given by:

yt = β 1 yt−1 + β 2 yt−2 + vt (34)

9.5.2 Solving the system of equations: Inverting the matrix


• Let’s use the lag operator:
yt = β 1 Lyt + β 2 L2 yt + vt

(I − β 1 L − β 2 L2 )yt = vt

• Let’s define:

A(L) = (I − β 1 L − β 2 L2 )

97
• So, we would like to be able to express the solution as:

yt = A(L)−1 vt

• To do so, first we need to check if the inverse exist. That is, if A(L) is a nonsingular matrix. So, we
need to check if:
1. A(L) is a square matrix
2. |(I − β 1 L − β 2 L2 )| =
̸ 0 so we have:

1 ′
(I − β 1 L − β 2 L2 )−1 = C
|(I − β 1 L − β 2 L2 )|

, in which C is the transpose of the cofactor matrix
Condition 1 is satisfied given that β 1 and β 2 are square matrices.
• For condition 2, notice that |(I − β 1 L − β 2 L2 )| is a fourth order polynomial on L:

1 − β1x L − β3x L2 β2x L − β4x L2


 
2
|(I − β 1 Lβ 2 L | =
β1y L − β3y L2 1 − β2y L − β4y L2

Thus, we have that:

|(I − β 1 L − β 2 L2 ))| = π(L) = 1 − π1 L − π2 L2 − π3 L2 − π4 L4 (35)

• Thus, condition 2 is the same as checking that π(L)−1 exists.


• Further, we are in a similar situation as is the usual AR(p) case. That is, we would like:
1. π(L)−1 < ∞ (i.e. it exists)
2. π(L)−1 has no meaning in Time Series (i.e. How do I multiply it to the error term?). So, we
would like π(z)−1 to be a regular function so we can use Taylor’s Theorem around 0 and express
it as an infinite sum.
3. Check if the power series converges when |z| = 1 (So we know that the infinite sum time the error
term (i.e. the RHS) converges in the second moment)
• We know that we can find the roots of the characteristic polynomial π(z), zi , and express it as:
    
1 1 1 1
π(z) = 1− z 1− z 1− z 1− z
z1 z2 z3 z4

• Further, we know that, the characteristic function (1 − ρ)−1 is regular (analytic) for all z ∈ C\L, in
which

L = {z ∈ C : Re(z) ≥ zi and Im(z) = 0}

Thus, from Taylor’s Theorem we have that π(z)−1 can be expressed as a power series with z0 ∈ C\L.
• The Taylor series expansion around zero is valid for |zi −1 z| < 1

98
• For |z| = 1, the Taylor series expansion around zero is valid if

|zi | > 1

Otherwise, the infinite sum does not converge to the function (1 − ρz)−1
• Thus, we need the roots of the characteristic polynomial |I − β 1 z| to be outside the unit circle.
• If so, we have that the solution for the system is given by:

yt = |A(L)|−1 C ′ vt

yt = π(L)−1 C ′ vt

∞  j ∞  j  j  j
X 1 X 1 1 1
yt = Lj Lj Lj Lj C ′ vt
j=0
z 1 j=0
z2 z3 z4


X
yt = ψj Lj C ′ vt
j=0


1 − β2y L − β4y L2 −β2x L + β4x L2
X  
yt = ψj Lj vt
−β1y L + β3y L2 1 − β1x L − β3x L2
j=0

∞  
X c1j c2j
yt = Lj vt
c3j c4j
j=0


X
yt = Cj L j v t
j=0

• So, we have an infinite MA representation for each variable with two errors. Using Theorem 6, we know
that the solution does not only converges in the second moment but it is also a stationary solution.
• Therefore, if the roots of the characteristic polynomial |A(z)| are outside the unit circle, the stationary
solution of the system is given by:


X
yt = Cj L j v t (36)
j=0

9.5.3 Stationary Solution


As we have seen in section 8.5.2,
• If the roots of the characteristic polynomial |I − β 1 z − β 2 z| are outside the unit circle:

|zi | > 1

99
• or if the roots of the (reflected) characteristic polynomial (|Iλ2 − β 1 λ − β 2 |) are inside the unit circle:

|λi | < 1

Then, the VAR(2) has the following stationary solution:



X
yt = Cj Lj vt (37)
j=0

9.6 The VAR(p) model


The model in matrix from is given by:

yt = β 1 yt−1 + β 2 yt−2 + β 3 yt−3 + · · · + β p yt−p + vt (38)

Let’s define

A(L) = (I − β 1 L − β 2 L2 − β 33 − . . . β p Lp )

So, the model can be rewritten as:

A(L)yt = vt

When solving the system, the difference is in the order of the characteristic polynomial |A(z)|. It is of order
(p + n) × (p + n).

9.6.1 Stationary Solution


As we have seen in section 8.5.2,
• If the roots of the characteristic polynomial |A(z)| are outside the unit circle:

|zi | > 1

• or if the roots of the (reflected) characteristic polynomial z p |A(z)| are inside the unit circle:

|λi | < 1

Then, the VAR(P) has the following stationary solution:



X
yt = Cj Lj vt (39)
j=0

, in which Cj is an n × n matrix.

100
10 Structural Vector Autoregressions
10.1 Motivation
• A classic question in empirical macroeconomics:
– what is the effect of a policy intervention (interest rate increase, fiscal stimulus) on macroeconomic
aggregates of interest - output, inflation, etc?
• Let Yt be a vector of macro time series.
• Let εrt denote an unanticipated monetary policy intervention.
• We want to know the dynamic causal effect of εrt on Yt :
∂Yt+h
, h = 1, 2, 3, . . .
∂εrt
where the partial derivative holds all other interventions constant.
• In macro, this dynamic causal effect is called the impulse response function (IRF) of Yt to the
”shock” (unexpected intervention) εrt .
• The challenge is to estimate  
∂Yt+h
∂εrt
from observational macro data.
• Two conceptual approaches to estimating dynamic causal effects (IRF):
– Structural model (Cowles Commission): DSGE or SVAR
– Quasi-Experiments

10.2 The Reduced VAR


• Consider the Reduced form VAR(p) :

Yt = A1 Yt−1 + . . . + Ap Yt−p + ut

or
A(L)Yt = ut
where
A(L) = I − A1 L − A2 L2 − . . . − Ap Lp

where Ai are the coefficients from the (population) regression of Yt on Yt−1 , . . . , Yt−p .
• If ut were the shocks, then we could compute the structural IRF using the MA representation of the
VAR,

Yt = A(L)−1 ut

• But in general ut is affected by multiple shocks: in any given quarter, GDP changes unexpectedly for
a variety of reasons.
• Is there a way to identify the structural shocks?
• For that we need to find the relationship between the reduced VAR and structural VAR.

101
10.3 The Sctrucural VAR
• Consider a bivariate first-order VAR model:
yt = b10 − b12 xt + γ11 yt−1 + γ12 xt−1 + εyt
xt = b20 − b21 yt + γ21 yt−1 + γ22 xt−1 + εxt

• The error terms (structural shocks) εyt and εxt are white noise innovations with standard deviations
σy and σx and a zero covariance.
• From this respresentation we would be able to find the IRF for the structural shocks!
• However, we have an issue:
– The two variables y and x are endogenous
– Note that shock εyt affects y directly and x indirectly.
• It is worth to point out that here we there are 10 parameters to estimate.

10.4 From a SVAR to an RVAR


• The structural VAR is not a reduced form.
• In a reduced form representation y and x are just functions of lagged y and x.
• To solve for a reduced form write the structural VAR in matrix form as:
         
1 b12 yt b10 γ11 γ12 yt−1 εyt
= + +
b21 1 xt b20 γ21 γ22 xt−1 εxt
RYt = Γ0 + Γ1 Yt−1 + εt

• Premultipication by R−1 allow us to obtain a standard VAR(1) :


RYt = Γ0 + Γ1 Yt−1 + εt
Yt = R−1 Γ0 + R−1 Γ1 Yt−1 + R−1 εt
Yt = Φ0 + Φ1 Yt−1 + ut

• This is the reduced form.


• We no longer have the endogenity issue
• We can estimate the parameters by OLS equation by equation (ofc, we need to check for stationarity
first)
• It is worth to point out we are estimating 9 parameters in the reduced VAR.

10.5 The Identification Problem


• Remember that we started with a structural VAR model, and jumped into the reduced form or standard
VAR for estimation purposes.
• Is it possible to recover the parameters in the structural VAR from the estimated parameters in the
standard VAR? No!!
• There are 10 parameters in the bivariate structural VAR(1) and only 9 estimated parameters in the
standard VAR(1).
• The VAR is underidentified.

102
10.6 Reduced form to structure
• The Reduced VAR
A(L)Yt = ut
Yt = A(L)−1 ut = C(L)ut
A(L) = I − A1 L − A2 L2 − . . . − Ap Lp
E[ut u′t ] = Σu (unrestricted)

• The Structural VAR


B(L)Yt = εt
Yt = B(L)−1 εt = D(L)εt
B(L) = B0 − B1 L − B2 L2 − . . . − Bp Lp
 2 
σ1 0
E[εt ε′t ] = Σε = 
 .. 
. 
2
0 σk

• Because εt = Rut ,
RA(L)Yt = Rut = εt .

• Letting RA(L) = B(L), this delivers the structural VAR,

B(L)Yt = εt ,

• The MA representation of the SVAR delivers the structural IRFs:

Yt = D(L)εt

D(L) = B(L)−1 = A(L)−1 R−1


∂Yt+h
• Impulse response: ∂εt = Dh
• Therefore, we have that:
Rut = εt
B(L) = RA(L) (B0 = R)
C(L) = A(L)−1
D(L) = C(L)R−1

10.6.1 A note about B0


• We know that B0 is the first element in the polynomial B(L).
• Since we know that B0 captures the contemporaneous relationship between the variables, we are very
tempted to say that the diagonal of B0 is full of 1s.
• However, if we do that, we are already making a normalization, which is not bad at all but you have
to take that into account.
• In general, we could have that B0 is full of parameters.
• For example, for the VAR with 2 variables:
 
b11 b12
B0 =
b21 b22

103
• Recall that B0 = R

10.6.2 Identification of R
• In population, we can know A(L).
• If we can identify R, we can obtain the SVAR coefficients,

B(L) = RA(L)

.
• The question here is how can we identify R:
– Identification by Short Run Restrictions: Sims (1980)
– Identification by Long Run Restrictions: Blanchard and Quah (1989)
– Identification from Heteroskedasticity: Rigobon (2003)
– Identification by Sign Restrictions: Uhlig (2005)
– Identification by External Instruments: Stock (2007), Stock and Watson (2012); Mertens and
Ravn (2013); Gertler and P. Karadi (2014); for IV in VAR (not full method) see Hamilton (2003),
Kilian (2009).
• The answer lies on the following equations:

Rut = εt

E [ut u′t ] = Σu
E [εt ε′t ] = Σε
Σε = RΣu R′

• Notice that Σu is identified, but the not identified parameters here are Σε and R.

10.7 Identification by Short Run Restrictions


• Based on Sims (1980)
• In general, the SVAR is fully identified if

RΣu R′ = Σε

can be solved for the unknown elements of R and Σε


• Recall that Σu is identified.
• Let’s say we have k variables.
• How many parameters do we have to identify?
– To see that, let’s go back to:
Σε = RΣu R′

– we can rewrite it as unidentified on the LHS and identified on the RHS:

R−1 Σε (R′ )−1 = Σu

104
– Now we can clearly see that we have to identify k parameters from Σε since Σε is a diagonal
matrix.
– Further, we also need to identify the parameters inside R, that is k × k parameters (remember
that in general R does not need to have the diagonal full of 1s.
– Therefore, we have k + k 2 parameters to identify.
• How many parameters do we have already identified?
– The ones on the RHS
– Recall that Σu is not a diagonal matrix because the reduced shocks are correlated.
– Given the structure of a variance covariance matrix, we do not have k × k identified parameters
because the lower triangle equals the upper triangle
– We only have k(k + 1)/2 identified parameters.
• How many parameters on the LHS can be identified?
– Given the above and that the quality holds:

R−1 Σε (R′ )−1 = Σu

it must be that we can only identify k(k + 1)/2


– Thus, we have k + k(k − 1)/2 parameters that cannot be identified.
• How can we identify the remaining k + k(k − 1)/2 parameters?
– First, we impose a normalization condition that helps with the identification of k parameters. We
have two ways to do so:
1. The diagonal components of B0 are 1’s
2. The variance-covariance matrix of structural disturbances is an identity matrix.
– Now we only have k(k − 1)/2 parameters to be identified.
• How can we identify the remaining k(k − 1)/2 parameters?
– To fully understand this, you must know about Cholesky decomposition.
– If not, please refer to Appendix A
– Letting Φ0 = B0−1 , it follows that
Φ0 Σε Φ′0 = Σu

– Let P be a lower triangular matrix of the Cholesky decomposition of Σε so that

PP′ = Σε
1
From Φ0 Σε2 = P, it follows that
1
Φ0 = PΣε − 2

– Given either of the normalizations, we end up with the condition that Φ0 must be lower triangular
(it is pretty evident if you normalize such that Σε = I).
– If Φ0 is lower triangular, then R−1 is lower triangular.
– Since R = B0 , it follows that B0 is lower triangular.

105
– Since B0 contains the coefficients associated with the contemporaneous relationship in the struc-
tural VAR, we are imposing a structure for the Short Run dynamics between the variables.
• If B0 is lower triangular we are imposing the idea that:
– yi,t for i = 2, 3, 4 . . . , k has no contemporaneous effect of y1,t .
– The residuals of u1,t are due to pure shocks to y1,t .
– yi,t for i = 3, 4 . . . , k has no contemporaneous effect of y2,t .
– yi,t for i = 4, 5 . . . , k has no contemporaneous effect of y3,t ... and so on.
– All the structural shocks have contemporaneous effect on the last variable, yk,t
• This description of identification is via method of moments, however identification can equally be
described via IV, e.g. see Blanchard and Watson (1986).

10.8 Identification by Long Run Restrictions


• Based on Blanchard and Quah (1989)
• This approach identifies R by imposing restrictions on the long-run effect of one or more ε ’s on one
or more Y ’s.
• What do we mean here by long-run effect?
– The long-run effect refers to the long run variance.
– The long run variance of a variable is its spectral density at zero frequency.
• What is the spectral density?
– The Spectral Density of xt is given by the Fourier Transform of the autocovariance function of xt
:
X∞
sx (ω) = γ̃(ω) = γj e−ijω
j=−∞

, in which ω ∈ (−π, π] represents the frequency.


– There is a Theorem that states the following:
Theorem 8 Consider


X
yt = ψj Lj xt = ψ(L)xt
j=0

, in which xt is a (complex-valued) zero-mean stationary process.


P∞ 2
∗ Then, if ψ(z) converges on the unit circle or if that j=0 ψz
j
converges on the unit circle,
we have that yt is stationary with its Spectral density given by:

2

X
−iw
sy (ω) = ψj e sx (ω)
j=0

2
sy (ω) = ψ(e−iwj ) sx (ω)

106
– Since the Spectral density is a Fourier Transform, then we can always apply the Fourier Inverse
to find each coefficient inside the infinite sum of the Fourier Transform, which in this case is γj
(i.e., each autocovariance of xt )
– The Fourier Inverse of the Spectral density of xt gives us:
Z π
1
γj = sx (ω)eiωj dω
2π −π

, in which γj is the autocovariance of xt of order j for every integer j.


– For j = 0, we have that
Z π
1
γ0 = sx (ω)dω
2π −π

– Thus, from an AR(P) representation for yt we have that the Spectral Density of yt is given by:
2

X
−iw
sy (ω) = ψj e sε (ω)
j=0

2
sy (ω) = ψ(e−iwj ) sε (ω)
, in which ω ∈ (−π, π] represents the frequency.
– That was for yt being univariate, but how it is if we have that Yt being multivariate?
– I will try to answer this intuitively.
– In the univariate case, we know that the Inverse Fourier of the spectral density of yt is the
autocovariance function of yt .
– If we are in the bi-variate case, we should expect that the Inverse Fourier of the Spectral Density
of Yt gives us the autocovariance function too.
– However, for this case we should get a matrix 2 × 2 as follows:
 
γy (h) γy,x (h)
=
γx,y (h) γx (h)

– Thus, it must be the case that the Spectral Density is also a matrix.
– Each element on the main diagonal is the autospectrum (i.e., the Fourier transform of the auto-
covariance function of each variable)
– While the elements on its off-diagonals are the cross-spectra between yi,t and yj,t , i ̸= j = 1, . . . , k
(i.e., the Fourier transform of the covariance function of yi,t and yj,t , i ̸= j = 1, . . . , k)
– For the Yt being multivariate, we have a modified version of the Theorem 8.
Theorem 9 Consider

Yt = A(L)−1 Xt

, in which Xt is a (complex-valued) zero-mean stationary vector.

107
∗ Then, if |A(z)| converges on the unit circle, we have that Yt is stationary with its Spectral
density given by:


sy (ω) = A(e−iw )−1 sx (ω)A(e−iw )−1

– Thus, for the VAR(P) for Yt , the spectral density is:


sy (ω) = A(e−iw )−1 su (ω)A(e−iw )−1

Recall we have that

Rut = εt

B(L) = RA(L)

Yt = B(L)εt

Thus, the spectral density can be written also as:


sy (ω) = B(e−iw )−1 sε (ω)B(e−iw )−1

– Now, remember we are interested in the long run variance, which is the spectral density at zero
frequency:


sy (0) = A(1)−1 su (0)A(1)−1


sy (0) = B(1)−1 sε (0)B(1)−1

– Given that the structural shocks are not correlated and that they are i.i.d. (the autocovariances
are zero), we have that:

sε (0) = Σε

– Therefore, we have that the long run variance of Yt is given by:


sy (0) = A(1)−1 su (0)A(1)−1


sy (0) = B(1)−1 Σε B(1)−1

Thus,

′ ′
A(1)−1 su (0)A(1)−1 = B(1)−1 Σε B(1)−1

Using the fact that B(L) = RA(L):

108
′ ′
A(1)−1 su (0)A(1)−1 = (RA(1))−1 Σε (RA(1))−1

Again we need to check the number of parameters that must be identified.


∗ All parameters on the LHS are identified
∗ Recall that he LHS is the long run variance of Yt .
∗ Given the structure of a variance covariance matrix, we do not have k×k identified parameters
because the lower triangle equals the upper triangle.
∗ We only have k(k + 1)/2 identified parameters.
∗ From the RHS, we can see that we have we have to identify k parameters from Σε since Σε
is a diagonal matrix.
∗ Further, we also need to identify the parameters inside R, that is k × k parameters (remember
that in general R does not need to have the diagonal full of 1s.
∗ Therefore, we have k + k 2 parameters to identify.
∗ Given that the above equality holds, we have k + k(k − 1)/2 parameters that cannot be
identified.
∗ If we normalize the variance-covariance matrix of structural disturbances (i.e., it is an identity
matrix), we only have k(k − 1)/2 parameters to be identified.
• How do we identify the remaining k(k − 1)/2?
– Recall that for IRF, we have that Yt = D(L)εt
– Thus, D(L) = B(L)−1
– We identify the remaining k(k − 1)/2 using the long run neutrality restriction.
– The main way long restrictions are implemented in practice is by setting Σε = I and imposing
zero restrictions on D(1).
– Imposing Dij (1) = 0 says that the effect the long-run effect on the ith element of Yt , of the j th
element of εt is zero
– If Σε = I, the moment equation above can be rewritten,

sy (0) = D(1)D(1)′

where D(1) = B(1)−1 .


– Because RA(1) = B(1), R is obtained from D(1) as R = A(1)−1 B(1), and B(L) = RA(L) as above.
– If the zero restrictions on D(1) make D(1) lower triangular, then D(1) is the Cholesky factorization
of sy (0).

10.9 Identification from Heteroskedasticity


• Based on Rigobon (2003)
• Let’s assume that:
1. The variance of the structural shock is such that it has structural break.

109
2. The structural shock variance breaks at date s : Σε,1 before, Σε,2 after.
3. R doesn’t change between variance regimes.
• Let’s normalize R to have 1’s on the diagonal.
• Thus the unknowns are:
– in R we would have had k × k unknown parameters, but given the normalization of 1’s on the
diagonal, we only have : k 2 − k unknown parameters
– Σε,1 is a diagonal matrix, so we have k unknown parameters
– Σε,2 is a diagonal matrix, so we have k unknown parameters
• Recall that Rut = εt , so by looking at the variance, we have that:
First period: RΣu,1 R′ = Σε,1
Second period: RΣu,2 R′ = Σε,2
We can rewrite the above equations such that the identified parameters are on the LHS and the
unidentified ones are on the RHS:
−1
First period: Σu,1 = R−1 Σε,1 R′
−1
Second period: Σu,2 = R−1 Σε,2 R′
Thus, we have:
– For the first period, on the LHS we have k(k + 1)/2 identified parameters
– For the second period, we have k 2 − k k(k + 1)/2 identified parameters.
– On the RHS from both periods we have K 2 − k unidentified parameters for R, K unidentified
parameters for Σε,1 , and k unidentified paramtersfor Σε,2
– In total, we have k(k + 1) identified parameters, and k 2 + k unidentified parameters.
• There is a rank condition here too - for example, identification will not be achieved if Σε,1 and Σε,2
are proportional.
• The break date need not be known as long as it can be estimated consistently
• Different intuition: suppose only one structural shock is homoskedastic. Then find the linear combi-
nation without any heteroskedasticity!

11 Cointegrated VAR
11.1 Motivation
• Many economic variables are not stationary, and we consider the type of non-stationarity that can be
removed by differencing.
• Let’s think about the reduced VAR in first differences.
• Is it always the right approach if all the variable are I(1)?
• What if the variables share a long-run relationship?
• That is, a relationship that is stable in the long run.
• Wouldn’t it make sense that this stable relationship should be a regressor in the VAR in first differences?
• In the short-run dynamics, the movements of the variables might be guided by a long-run relationship.

110
11.2 I(0) process
• Let in the following ϵt be a sequence of independent identically distributed p -dimensional random
variables with mean zero and variance matrix Ω.
• A linear process defined by


X
Yt = Cj εt−j
j=0

is I(0) if:
P∞ j
1. j=0 Cj z is convergent for |z| ≤ 1
P∞
2. i=j Cj ̸= 0

11.3 I(d) process


A stochastic process X is called integrated of order d, I(d), d = 0, 1, 2, . . . if

∆d Xt

is I(0). An important aspect of this definition is that it is enough to have one of the infinite sum of the errors
not converging for one element of Xt thus allowing the component processes to be integrated of different
orders. Remember that in general we can have p infinite sums for each element in Xt

11.4 Cointegrated process with cointegrating vector β


• Let X,be integrated of order 1.
• We call X, cointegrated with cointegrating vector β ̸= 0 if:

β ′ Xt

can be made stationary by a suitable choice of its initial distribution.


• The cointegrating rank is the number of linearly independent cointegrating relations
• The space spanned by the cointegrating relations is the cointegrating space.
• Note that β ′ Xt need not be I(0)
• However, but for AR processes the cointegrating relations we find are in fact I(0)

11.4.1 Example
• Consider the following process:

Pt
X1t = i=1 ε1i + ε2t
Pt
X2t = a i=1 ε1i + ε3t
X3t = ε4t

• Clearly X3t is I(0) but the vector process Xt = (X1 , X2t , X3t ) is an I(1) process since the other two
elements have infinite sums that do not converge (both have one unit roots)

111
• It has two cointegrating vectors:

β1 = (a, −1, 0)

β2 = (0, 0, 1)
To see this, notice that:

t
X t
X
β1 Xt = a ε1i + aε2t − a ε1i − ε3t + 0
i=1 i=1

β1 Xt = aε2t − ε3t

, which is stationary.

11.5 Spurious Regression


• For a random walk, unconditional population moments (i.e. which don’t depend on time t ), such as
E[X], don’t exist. (In some loose sense, they are infinite.)
• Similarly, the unconditional Covariance between two independent random walks is zero.

• Let Yt = (y1t , . . . , ynt ) denote an (n × 1) vector of I(1) time series that are not cointegrated.
Write
′ ′
Yt = (y1t , Y2t )
and consider regressing of y1t on Y2t giving

y1 = β̂ 2 Y2t + ût

• Let’s assume y1t is not cointegrated with Y2t


• Then, ût ∼ I(1).
• If so, the above is a spurious regression
• The regression is spurious when we regress one random walk onto another independent random walk.
• It is spurious because the regression will most likely indicate a non-existing relationship:
1. The true value of β 2 is zero.
2. The coefficient estimate will not converge toward zero (the true value). Instead, in the limit the
coefficient estimate will follow a non-degenerate distribution.
3. The t value most often is significant.
4. R2 is typically very high.
• Lesson: just because two series move together does not mean they are related!
• Lesson: use extra caution when you run regression using nonstationary variables; be aware of the
possibility of spurious regression! Check whether the residual is nonstationary.

112
11.6 Cointegrated VAR(1)
To get some intuition of the Error Correction Model, we will start by analyzing the simple case of a cointe-
grated VAR(1)

11.6.1 Finding the integrated order of the process


• Let the VAR(1) model be:

yt = β 1 yt−1 + vt

• We know, that if the roots of the characteristic polynomial |I − β 1 z| are outside the unit circle, then
the process has a stationary solution:

∞  
X c1j c2j
yt = Lj vt
c3j c4j
j=0

• Thus, the process is I(0)


• Now, let’s suppose we have a matrix β 1 such that it generates a unit root (e.g. c1j ≥ 1 (remember
that the unit root is seen directly in having one of the elements in ψj equal to 1), so the infinite sum
never converges (i.e. the infinite sum times the error term (the RHS) does not converge in the second
moment)).
• Is the process I(1)? Yes, since:

yt = β 1 yt−1 + vt

A(L)yt = vt

1
yt = C ′ vt
(1 − π1 L − π2 L2 )

 −1  −1
1 1
yt = 1 − L 1− L C ′ vt
z1 z2

Let’s suppose that z1 = 1, so we have one unit root. Then we have that:

 −1
−1 1
yt = (1 − L) 1− L C ′ vt
z2

TRICK: How do we get rid off the explosive root?

 −1
1
(1 − L)yt = 1− L C ′ vt
z2

 −1
1
∆yt = 1− L C ′ vt
z2

113
Given that |z2 | > 1, the RHS do converges in the second moment. Thus, we have that


X
∆yt = C̃j vt−j
j=0

and it is stationary. So ∆yt is I(0).

11.6.2 Is it cointegrated?
• Let’s assume that there is a vector β such that β ′ yt is stationary, then yt is cointegrated

11.7 Rank decomposition


• Before goint to the ECM, it is worth to know about Rank factorization
• Given an m × n matrix A of rank r, a rank decomposition or rank factorization of A is a factorization
of A of the form

A = CF

, where C is an m × r matrix and F is an r × n matrix.


• The rank of the matrix is the number of columns or rows that are linearly independent
• Every finite-dimensional matrix has a rank decomposition
• The rank of a matrix can be found by finding the eigenvalues of the matrix that are nonzero.
• A matrix whose rank is equal to its dimensions is called a full rank matrix.
• When the rank of a matrix is smaller than its dimensions, the matrix is called rank-deficient, singular,
or multicolinear.
• Only full rank matrices have an inverse.
• In practice, we can construct one specific rank factorization as follows: we can compute B, the reduced
row echelon form of A. Then C is obtained by removing from A all non-pivot columns, and F by
eliminating all zero rows of B.

11.8 The Vector Error Correction model


• The vector error correction model is a alternative way to express the first difference of a cointegrated
VAR
• Introduced by Granger (1986)
• The idea comes from an alternative way to write the first difference of an I(1) process:

yt − yt−1 = β 1 yt−1 − yt−1 + vt

∆yt = (β 1 − I)yt−1 + vt

Let’s define:

Π = −(I − β 1 )

114
Then, we have that:

∆yt = Πyt−1 + vt (40)

• From section 9.4.1, we know that ∆yt is stationary and by definition vt is stationary.
• We also know that yt−1 is I(1)
• Thus, for (40) to be true, it must be that either:
– Πyt−1 is stationary
– or Π = 0
• Given the dimensions of β 1 , Π is a 2 × 2 matrix
• We would like to immediately think that Πyt−1 is the cointegrated relationship, but not quite.
• Let’s use the rank decomposition of Π:

Π = αβ ′

, in which α is 2 × r and β is 2 × r and r being the rank of the matrix Π.


• Then, we will say that β ′ is the matrix that contains the r cointegrated vectors
• Then, the rank of the matrix Π determines the number of cointegrated vectors
• If the rank of Π is zero, it means that Π must be a zero matrix since the zero matrix is the only matrix
whose rank is 0. That implies that there are no cointegrated vectors for yt
• If Π is full rank, then we have that it can be inverted so:

∆yt = Πyt−1 + vt

Π−1 ∆yt = yt−1 + Π−1 vt

Given that Π−1 is a finite constant matrix, we still have that Π−1 ∆yt and Π−1 vt are stationary.
Therefore, the only way in which the above equation can be true is if yt−1 is stationary, which is a
contradiction given that it is I(1). Thus:

0≤r<n

• So, the rank of the matrix Π is what we should try to find if we are looking for the number of
cointegrated vectors
• If yt is I(1), we can always express the Error correction model as follows:

∆yt = Πyt−1 + vt (41)

, in which Π = αβ ′ , α is 2 × r that captures the adjustment vectors (to the long-run equilibrium), β is
2 × r that captures the cointegrated vectors (long-run equilibrium), and r being the rank of the matrix
Π.

115
11.9 Cointegrated VAR(p)
11.9.1 VECM representation
• Let’s assume the process yt is I(1) with n elements
• Thus, ∆yt is I(0)
• Either if there are or not cointegrating vectors, the VECM representation should be valid as the rank
of Π tells us the number of cointegrating relationships
• The idea is similar to the one in section 9.6 but with some extra tricks
• The goal is to get a system of equations in first differences such the there exists an explicit Π matrix
such that its rank tells us the number of cointegrating relationships
• From the VAR(p), let’s subtract yt−1 on both sides so we get the first difference of the process on the
LHS:

yt − yt−1 = β 1 yt−1 − yt−1 + β 2 yt−2 + β 3 yt−3 + · · · + β p yt−p vt

∆yt = (β 1 − I)yt−1 + β 2 yt−2 + β 3 yt−3 + · · · + β p yt−p + vt

We would like to argue the same logic as in VAR(1), but it is not possible since the above equation
has the variables in levels, which are I(1), and we need all the variables in the VECM to be I(0).
• TRICK: To fix the problem mentioned above, we proceed to use the following trick:

yt−1 = yt − ∆yt

Thus, we proceed to transform all the variables in levels such that they can be expressed in first
differences:

∆yt = (β 1 − I)yt−1 + β 2 (yt−1 − ∆yt−1 ) + β 3 (yt−2 − ∆yt−2 ) + · · · + β p (yt−(p−1) − ∆yt−(p−1) ) + vt

Thus, we have some first differences of the variables, but we still have some variables in levels. Thus,
we have to continue the replacing until the only variable in levels is yt−1 9
• By doing so, we end up with the VECM representation of a VAR(p):

∆yt = Πyt−1 + Γ1 ∆yt−1 + Γ2 ∆yt−2 + · · · + Γp−1 ∆yt−(p−1) + vt (42)


, in which:

p
X
Γj = − βi , j = 1, . . . , p − 1
i=j+1


Π = − I − β1 − . . . − βp
9 e.g., yt−3 = yt−2 − ∆yt−2 = yt−1 − ∆yt−1 − ∆yt−2

116
• In the VECM representation, we have that the variable on the LHS is I(0). Thus, for (42) to be true,
we require all the elements on the RHS to be I(0), so Πyt−1 must be such that it contains all the
cointegrating vectors.
• So, using rank factorization, we have that:

Π = αβ ′

, in which Π is an n × n matrix, α is n × r matrix, β is an n × r matrix and r is the rank of Π. So the


rank of Π tells us the number of cointegrating vectors.
• The rank of Π is such that

0≤r<n

since if it is full rank, all the variables in levels would be I(0) (see explanation in section 9.6). If the
rank is zero, then Π is a zero matrix and the variables are not cointegrated.

11.10 The rank of Π


• To find the rank of Π, we just have to find the number of eigenvalues that are nonzero
• Notice that


Π = − I − β1 − . . . − βp

Recall that from our VAR(p) in levels, we have that:

A(L) = I − β 1 L − β 2 L2 − . . . − β p Lp


So, there is a relationship between the lag polynomial from the VAR(p) in levels and the Π matrix in
the VECM representation:

Π = −A(1) (43)

• So finding the rank of Π is finding the number of eigenvalues of A(z) for z = 1 that are nonzero:

|Π − Iλ| = | − A(1) − Iλ| (44)

• Furthermore, from the VAR(p) analysis in the previous section, we know that the first condition when
solving the system is that A(L)−1 exists. For ensuring that, we require the |A(L)| =
̸ 0.
• Given the explicit form for A(L), we know that |A(L)| is a polynomial of degree (p + n) × (p + n) in L.
• So, for A(L)−1 to exist we need the characteristic polynomial |A(z)| to not be equal to zero:

|A(z)| = π(z) = 1 − π1 z − π2 z 2 − · · · − πp×n z p×n ̸= 0 (45)

117
• Using the Fundamental Theorem of Algebra, we know, that we can express the above polynomial
in terms of its roots, zi :
    
1 1 1
|A(z)| = π(z) = 1− z 1 − z ... 1 − z ̸= 0 (46)
z1 z2 zp×n

• We know that Taylor series expansion is valid for each element in the multiplication for |z| ≤ 1 if
|zi | > 1
• Now, notice that z = 1 is just one particular case such that |z| = 1, but it is relevant in this case since
even for this trivial case, for the stationary solution to exist we need |zi | > 1.
• Now, remember that −A(1) = Π, so:

| − A(1)| = |Π|

and given that Π cannot be full rank (i.e. Π−1 does not exist, the matrix is singular) if the variables
are I(1) (see section 9.7.1). So it must be that

|Π| = 0

, which is the same as:

|A(1)| = 0

, which happens if the characteristic polynomial |A(z)| evaluated on z = 1 is such that:


    
1 1 1
|A(1)| = π(1) = 1− 1− ... 1 − =0 (47)
z1 z2 zp×n

, which happens if at least one of the roots is 1:

zi = 1

• Thus, having at least one unit root in the characteristic polynomial |A(z)| ensures that Π is not a full
rank matrix and the VECM representation is valid (i.e., the rank of Π can be zero but it is not full
rank so the variables are I(1)).
• Finally if at least one root in the characteristic polynomial |A(z)| is such that

|zi | < 1
 −1
1
then, we have that for that root zi , the Taylor expansion for 1 − zi z is not valid for |z| = 1. Thus,
the characteristic polynomial |A(z)| cannot be expanded as Taylor Series expansion because we have
that the infinite sum for |z| = 1 does not converge (i.e. ∞).

118
• Notice that having |zi | < 1 for the characteristic polynomial |A(z)| does not make |A(1)| = 0, but we
would have that:

Π−1 = |Π|−1 C ′

Π−1 = |A(1)|−1 C ′

 −1  −1  −1


−1 1 1 1
Π = 1− 1− ... 1 − C′
z1 z2 zp×n

in which C ′ is the transpose of the cofactor matrix. Now, for the root such that |zi | < 1, we have that
 −1
Taylor Series expansion is not valid for 1 − z1i z for |z| = 1. Thus, |A(z)|−1 is not convergent for
|z| = 1:

 −1  −1  −1


1 1 1
|A(z)|−1 = 1− z 1− z ... 1 − z
z1 z2 zp×n

, in which each element is expanded using Taylor’s Theorem so we get an infinite sum for each element.
However, for |z| = 1, the infinite sum linked with the explosive root, |zi | < 1, is not convergent (i.e. it
is infinity). Thus, for |z| = 1 which includes z = 1:

|A(1)|−1 = ∞

Π−1 = ∞C ′ = ∞

Thus, Π−1 does not exist even though |A(1)| =


̸ 0.
• Therefore, if one root in the characteristic polynomial |A(z)| is such that it is inside the unit circle:

zb ≤ 1

, we have that Π−1 does not exist, so Π is a singular matrix and it does not have full rank:

r<n

• Recall that X is I(1) if it has 1 unit root, I(2) if it has 2 unit roots. So the above case does not fall
into the category of Integrated process because it has an explosive root but not a unit root.

12 Estimatoin of the Cointegrated VAR(P)


12.1 Residual-Based Tests for Cointegration
• Let the (n × 1) vector Yt be I(1).

119
• Recall, Yt is cointegrated with 0 < r < n cointegrating vectors if there exists an (r × n) matrix B′
such that  ′   
β 1 Yt u1t
B′ Yt =  ..   .. 
 =  .  ∼ I(0)

.
β ′r Yt urt

• Cointegration tests cover two situations:


1. There is at most one cointegrating vector
2. There are possibly 0 ≤ r < n cointegrating vectors.
• Based on Engle and Granger (1987)
• They developed a simple two-step residual-based testing procedure based on regression techniques.
• Engle and Granger’s two-step procedure for determining if the (n×1) vector β is a cointegrating vector
is as follows:
1. Form the cointegrating residual β ′ Yt = ut
2. Perform a unit root test on ut to determine if it is I(0).
• The null hypothesis in the Engle-Granger procedure is no-cointegration (i.e., ut has a unit root) and
the alternative is cointegration.
• There are two cases to consider:
1. The proposed cointegrating vector β is pre-specified:
– For example, economic theory may imply specific values for the elements in β such as β =
(1, −1)′ .
– The hypotheses to be tested are H0 : ut = β ′ Yt ∼ I(1) (no cointegration) H1 : ut = β ′ Yt ∼
I(0) (cointegration)
– Any unit root test statistic may be used to evaluate the above hypotheses.
– Tests for cointegration using a pre-specified cointegrating vector are generally much more
powerful than tests employing an estimated vector.
2. The proposed cointegrating vector is estimated from the data and an estimate of the cointegrating

residual β̂ Yt = ût is formed:
– Since β is unknown, to use the Engle-Granger procedure it must be first estimated from the
data.
– Before β can be estimated some normalization assumption must be made to uniquely identify
it.
′ ′ ′
– A common normalization is to specify Yt = (y1t , Y2t ) where Y2t = (y2t , . . . , ynt ) is an
′
((n − 1) × 1) vector and the cointegrating vector is normalized as β = 1, −β ′2 .
– Engle and Granger propose estimating the normalized cointegrating vector β 2 by least squares
from the regression
y1t = γ ′ Dt + β ′2 Y2t + ut
Dt = deterministic terms
and testing the no-cointegration hypothesis with a unit root test using the estimated cointe-
grating residual
ût = y1t − γ̂ ′ Dt − β̂ 2 Y2t

120
– Phillips and Ouliaris (1990) show that ADF and PP unit root tests applied to the estimated
cointegrating residual do not have the usual Dickey-Fuller distributions under the null hy-
pothesis of no-cointegration.
– Due to the spurious regression phenomenon under the null hypothesis, the distribution of the
ADF and PP unit root tests have asymptotic distributions that depend on:
∗ The deterministic terms in the regression used to estimate β 2
∗ The number of variables, n − 1, in Y2t
– Hansen (1992):
∗ The asymptotic distributions of standard cointegration test statistics are shown to depend
both upon regressor trends and estimation detrending methods.
∗ It is suggested that trends be excluded in the levels regression for maximal efficiency.
∗ Fully modified test statistics are asymptotically chi-square.

12.2 Dynamic OLS


• Stock and Watson (1993) suggest adding p leads and lags of ∆Y2t :
p
X
y1t =γ ′ Dt + β ′2 Y2t + ψ ′j ∆Y2t−j + ut
j=−p

=γ Dt + β ′2 Y2t
+ ψ ′0 ∆Y2t
+ ψ ′p ∆Y2t+p + · · · + ψ ′1 ∆Y2t+1
+ ψ ′−1 ∆Y2t−1 + · · · + ψ ′−p ∆Y2t−p + ut

• Estimate the augmented regression by least squares.


• The resulting estimator of β 2 is called the dynamic OLS estimator and is denoted β̂ 2,DOLS .

• β̂ 2,DOLS is consistent, asymptotically normally distributed and efficient (equivalent to MLE) under
certain assumptions (see Stock and Watson (1993))

12.3 Johansen’s Methodology for Modeling Cointegration


The basic steps in Johansen’s methodology are:
1. Specify and estimate a VAR(p) model for Yt .
2. Construct likelihood ratio tests for the rank of Π to determine the number of cointegrating vectors.
3. If necessary, impose normalization and identifying restrictions on the cointegrating vectors.
4. Given the normalized cointegrating vectors estimate the resulting cointegrated VECM by maximum
likelihood.

12.3.1 Likelihood Ratio Tests for the Number of Cointegrating Vectors


• The unrestricted cointegrated VECM is denoted H(r).
• The I(1) model H(r) can be formulated as the condition that the rank of Π is less than or equal to r.

121
• This creates a nested set of models
H(0) ⊂ · · · ⊂ H(r) ⊂ · · · ⊂ H(n)
H(0) = non-cointegrated VAR
H(n) = stationary VAR(p)

• This nested formulation is convenient for developing a sequential procedure to test for the number r
of cointegrating relationships.
• Johansen formulates likelihood ratio (LR) statistics for the number of cointegrating relationships as
LR statistics for determining the rank of Π.
• Recall, the rank of Π is equal to the number of non-zero eigenvalues of Π.
• Thus, these LR tests are based on the estimated eigenvalues λ̂1 > λ̂2 > · · · > λ̂n of the matrix Π.
• Johansen derived two statistic tests for the number of cointegrating vectors:
1. Trace Statistic
2. Maximum Eigenvalue Statistic

12.3.2 Johansen’s Trace Statistic


• Johansen’s LR statistic tests the nested hypotheses

H0 (r) : r = r0 vs. H1 (r0 ) : r > r0

• The LR statistic, called the trace statistic, is given by


n
X  
LRtrace (r0 ) = −T ln 1 − λ̂i
i=r0 +1

• The asymptotic null distribution of LRtrace (r0 ) is not chi-square but instead is a multivariate version
of the Dickey-Fuller unit root distribution which depends on the dimension n − r0 and the specification
of the deterministic terms.
• Sequential Procedure for Determining the Number of Cointegrating Vectors:
– First test H0 (r0 = 0) against H1 (r0 > 0).
– If this null is not rejected then it is concluded that there are no cointegrating vectors among the
n variables in Yt .
– If H0 (r0 = 0) is rejected then it is concluded that there is at least one cointegrating vector and
proceed to test H0 (r0 = 1) against H1 (r0 > 1).
– If this null is not rejected then it is concluded that there is only one cointegrating vector.
– If the H0 (r0 = 1) is rejected then it is concluded that there is at least two cointegrating vectors.
– The sequential procedure is continued until the null is not rejected.

122
12.3.3 Johansen’s Maximum Eigenvalue Statistic
• Johansen also derives a LR statistic for the hypotheses

H0 (r0 ) : r = r0 vs. H1 (r0 ) : r0 = r0 + 1

• The LR statistic, called the maximum eigenvalue statistic, is given by


 
LRmax (r0 ) = −T ln 1 − λ̂r0 +1

• As with the trace statistic, the asymptotic null distribution of LRmax (r0 ) is not chi-square but instead is
a complicated function which depends on the dimension n−r0 and the specification of the deterministic
terms.

12.3.4 Estimation of the Cointegrated VECM


• Johansen suggests applying Maximum Likelihood Estimation (MLE) for the Error Correction Model,
which is a restricted regression

∆Yt =ΦDt + αβ ′ Yt−1 + Γ1 ∆Yt−1


+ · · · + Γp−1 ∆Yt−p+1 + εt ,

,where Dt denotes the deterministic components.


• This regression is restricted in the sense that it imposes the hypothesis the rank of Π
• The estimates of β are obtained using a special technique called reduced rank regression.
• Unlike the Engle-Granger Two-Step procedure, Johansen method estimates α and β simultaneously.
• The MLEs of the remaining parameters are obtained by least squares estimation of

∆Yt = ΦDt + αβ̂ mle Yt−1 + Γ1 ∆Yt−1
+ · · · + Γp−1 ∆Yt−p+1 + εt ,

12.3.5 Reduced Rank Regression


• The reduced rank regression model is a multivariate regression model with a coefficient matrix with
reduced rank.
• The reduced rank regression algorithm is an estimation procedure which estimates the reduced rank
regression model.
• It is related to canonical correlations10 and involves calculating eigenvalues and eigenvectors.
• Reduced rank regression model:
– We consider the multivariate regression of Y on X and Z:

Yt = ΠXt + ΓZt + εt , t = 1, . . . , T
10 Instatistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information
from cross-covariance matrices. If we have two vectors X = (X1 , . . . , Xn ) and Y = (Y1 , . . . , Ym ) of random variables, and there
are correlations among the variables, then canonical correlation analysis will find linear combinations of X and Y which have
maximum correlation with each other.

123
– Y is n × 1
– X is n × 1
– z is of dimension k
– The hypothesis that Π has reduced rank less than or equal to r is expressed as

Π = αβ ′

– α is n × r
– β is n × r
– r<n
• Reduced rank regression algorithm:
– In order to describe the algorithm, we introduce the notation for product moments
T
X
−1
Syx = T Yt Xt′
t=1

−1
Syx.z = Syx − Syz Szz Szx

– The algorithm consists of the following steps:


1. Recall that the Frisch-Waugh theorem shows that one can partial out the parameter Γ. Thus,
first, regress Y and X on Z and form residuals ():
−1
(Y | Z)t = Yt − Syz Szz Zt
−1
(X | Z)t = Xt − Sxz Szz Zt
and product moments:
T
X
Syx.z = T −1 (Y | Z)t (X | Z)′t
t=1
−1
= Syx − Syz Szz Szx ,

2. Next, solve the eigenvalue problem


−1
λSxx.z − Sxy.z Syy.z Syx.z = 0

where |.| denotes determinant.


3. The ordered eigenvalues are Λ = diag (λ1 , . . . , λr ) and the eigenvectors are V = (v1 , . . . , vq ),
so that

−1
Sxx.z V Λ = Sxy.z Sxx.z Sxy.z V

−1
Λ and V are known as the generalized eigenvalues and eigenvectors of Sxy.z Sxx.z Sxy.z with
respect to Sxx.z .

124
4. Recall that the factorization ′
Π̂mle = α̂mle β̂ mle
is not unique. Thus, V is normalized so that

V ′ Sxx.z V = Ip

and

V ′ Syx.z Sxx.z
−1
Sxy.z V = Λ

5. Finally, define the estimators


β̂ = (v1 , . . . , vr )
together with

α̂ = Syx.z β̂

 −1
Ω̂ = Syy.z − Syx.z β̂ β̂ ′ Sxx.z β̂ β̂ ′ Sxy.z

Equivalently, once β̂ has been determined, α̂ and Γ̂ are determined by regression.


– Note the difference between the unrestricted estimate ˆ OLS = Syx.z Sxx.z−1
Q
and the reduced rank
 −1
ˆ
Q ′
regression estimate RRR = Syx.z β̂ β̂ Sxx.z β̂ ′
β̂ of the coefficient matrix to X.

– For interpretations, it is often convenient to normalize or identify the cointegrating vectors by


choosing a specific coordinate system in which to express the variables.
– An arbitrary normalization is to solve for the triangular representation of the cointegrated system
(default method in Eviews).
– The resulting normalized cointegrating vector is denoted β̂ c,mle .

– The normalization of the MLE for β to β̂ c,mle will affect the MLE of α but not the MLEs of the
other parameters in the VECM.
– β̂ c,mle is super consistent

– Let β̂ c,mle denote the MLE of the normalized cointegrating matrix β c . Johansen (1995) showed
that    
T vec β̂ c,mle − vec (β c )

is asymptotically (mixed) normally distributed

12.3.6 Hypothesis Testing about the coefficients


• Asymptotic standard chi-squared inference can be conducted on all parameters of the model using
likelihood ratio, LR, tests.
• For further on this please refer to Johansen et al. (1995)

125

You might also like