Time_Series_Analysis_in_Python_with_statsmodels
Time_Series_Analysis_in_Python_with_statsmodels
net/publication/340142416
CITATIONS READS
43 14,060
3 authors, including:
All content following this page was uploaded by Josef Perktold on 23 June 2020.
Abstract—We introduce the new time series analysis features of scik- In the simplest case, the errors are independently and identically
its.statsmodels. This includes descriptive statistics, statistical tests and sev- distributed. Unbiasedness of OLS requires that the regressors and
eral linear model classes, autoregressive, AR, autoregressive moving-average, errors be uncorrelated. If the errors are additionally normally
ARMA, and vector autoregressive models VAR. distributed and the regressors are non-random, then the resulting
Index Terms—time series analysis, statistics, econometrics, AR, ARMA, VAR,
OLS or maximum likelihood estimator (MLE) of β is also nor-
GLSAR, filtering, benchmarking mally distributed in small samples. We obtain the same result, if
we consider consider the distributions as conditional on xt when
they are exogenous random variables. So far this is independent
Introduction
whether t indexes time or any other index of observations.
Statsmodels is a Python package that provides a complement to When we have time series, there are two possible extensions
SciPy for statistical computations including descriptive statistics that come from the intertemporal linkage of observations. In the
and estimation of statistical models. Beside the initial models, lin- first case, past values of the endogenous variable influence the
ear regression, robust linear models, generalized linear models and expectation or distribution of the current endogenous variable, in
models for discrete data, the latest release of scikits.statsmodels the second case the errors εt are correlated over time. If we have
includes some basic tools and models for time series analysis. either one case, we can still use OLS or generalized least squares
This includes descriptive statistics, statistical tests and several GLS to get a consistent estimate of the parameters. If we have
linear model classes: autoregressive, AR, autoregressive moving- both cases at the same time, then OLS is not consistent anymore,
average, ARMA, and vector autoregressive models VAR. In this and we need to use a non-linear estimator. This case is essentially
article we would like to introduce and provide an overview of the what ARMA does.
new time series analysis features of statsmodels. In the outlook
at the end we point to some extensions and new models that are Linear Model with autocorrelated error (GLSAR)
under development. This model assumes that the explanatory variables, regressors,
Time series data comprises observations that are ordered along are uncorrelated with the error term. But the error term is an
one dimension, that is time, which imposes specific stochastic autoregressive process, i.e.
structures on the data. Our current models assume that obser-
vations are continuous, that time is discrete and equally spaced E(xt , εt ) = 0
and that we do not have missing observations. This type of data
εt = a1 εt−1 + a2 εt−1 + ... + ak εt−k
is very common in many fields, in economics and finance for
example, national output, labor force, prices, stock market values, An example will be presented in the next section.
sales volumes, just to name a few.
In the following we briefly discuss some statistical properties Linear Model with lagged dependent variables (OLS, AR, VAR)
of the estimation with time series data, and then illustrate and This group of models assume that past dependent variables, yt−i ,
summarize what is currently available in statsmodels. are included among the regressors, but that the error term are not
serially correlated
Ordinary Least Squares (OLS)
E(εt , εs ) = 0, for t 6= s
The simplest linear model assumes that we observe an endogenous
variable y and a set of regressors or explanatory variables x, where yt = a1 yt−1 + a2 yt−1 + ... + ak yt−k + xt β + εt
y and x are linked through a simple linear relationship plus a noise
or error term Dynamic processes like autoregressive processes depend on ob-
yt = xt β + εt servations in the past. This means that we have to decide what to
do with the initial observations in our sample where we do nnt
* Corresponding author: [email protected] observe any past values.
¶ Duke University
‡ University of North Carolina, Chapel Hill The simplest way is to treat the first observation as fixed, and
§ American University analyse our sample starting with the k-th observation. This leads
to conditional least squares or conditional maximum likelihood
Copyright © 2011 Wes McKinney et al. This is an open-access article dis- estimation. For conditional least squares we can just use OLS to
tributed under the terms of the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, pro- estimate, adding past endog to the exog. The vector autoregressive
vided the original author and source are credited. model (VAR) has the same basic statistical structure except that
108 PROC. OF THE 10th PYTHON IN SCIENCE CONF. (SCIPY 2011)
we consider now a vector of endogenous variables at each point in Besides estimation of the main linear time series models,
time, and can also be estimated with OLS conditional on the initial statsmodels also provides a range of descriptive statistics for
information. (The stochastic structure of VAR is richer, because time series data and associated statistical tests. We include an
we now also need to take into account that there can be contempo- overview in the next section before describing AR, ARMA and
raneous correlation of the errors, i.e. correlation at the same time VAR in more details. Additional results that facilitate the usage
point but across equations, but still uncorrelated across time.) The and interpretation of the estimated models, for example impulse
second estimation method that is currently available in statsmodels response functions, are also available.
is maximum likelihood estimation. Following the same approach,
we can use the likelihood function that is conditional on the first OLS, GLSAR and serial correlation
observations. If the errors are normaly distributed, then this is
Suppose we want to model a simple linear model that links the
essentially equivalent to least squares. However, we can easily
stock of money in the economy to real GDP and consumer price
extend conditional maximum likelihood to other models, for
index CPI, example in Greene (2003, ch. 12). We import numpy
example GARCH, linear models with generalized autoregressive
and statsmodels, load the variables from the example dataset
conditional heteroscedasticity, where the variance depends on the
included in statsmodels, transform the data and fit the model with
past, or models where the errors follow a non-normal distribution,
OLS:
for example Student-t distributed which has heavier tails and is
sometimes more appropriate in finance. import numpy as np
import scikits.statsmodels.api as sm
The second way to treat the problem of intial conditions is tsa = sm.tsa # as shorthand
to model them together with other observations, usually under
the assumption that the process has started far in the past and mdata = sm.datasets.macrodata.load().data
endog = np.log(mdata['m1'])
that the initial observations are distributed according to the long exog = np.column_stack([np.log(mdata['realgdp']),
run, i.e. stationary, distribution of the observations. This exact np.log(mdata['cpi'])])
maximum likelihood estimator is implemented in statsmodels exog = sm.add_constant(exog, prepend=True)
for the autoregressive process in statsmodels.tsa.AR, and for the
res1 = sm.OLS(endog, exog).fit()
ARMA process in statsmodels.tsa.ARMA.
print res1.summary() provides the basic overview of the
Autoregressive Moving average model (ARMA) regression results. We skip it here to safe on space. The Durbin-
ARMA combines an autoregressive process of the dependent Watson statistic that is included in the summary is very low
variable with a error term, moving-average or MA, that includes indicating that there is a strong autocorrelation in the residuals.
the present and a linear combination of past error terms, an Plotting the residuals shows a similar strong autocorrelation.
ARMA(p,q) is defined as As a more formal test we can calculate the autocorrelation, the
Ljung-Box Q-statistic for the test of zero autocorrelation and the
E(εt , εs ) = 0, for t 6= s associated p-values:
yt = µ + a1 yt−1 + ... + ak yt−p + εt + b1 εt−1 + ... + bq εt−q acf, ci, Q, pvalue = tsa.acf(res1.resid, nlags=4,
confint=95, qstat=True,
unbiased=True)
As a simplified notation, this is often expressed in terms of lag- acf
polynomials as #array([1., 0.982, 0.948, 0.904, 0.85])
φ (L)yt = ψ(L)εt pvalue
#array([3.811e-045, 2.892e-084,
where 6.949e-120, 2.192e-151])
φ (L) = 1 − a1 L1 − a2 L2 − ... − ak L p To see how many autoregressive coefficients might be relevant, we
1 2
ψ(L) = 1 + b1 L + b2 L + ... + bk L q can also look at the partial autocorrelation coefficients
tsa.pacf(res1.resid, nlags=4)
L is the lag or shift operator, Li xt = xt−i , L0 = 1. This is the same #array([1., 0.982, -0.497, -0.062, -0.227])
process that scipy.lfilter uses. Forecasting with ARMA models has Similar regression diagnostics, for example for heteroscedastic-
become popular since the 1970’s as Box-Jenkins methodology, ity, are available in statsmodels.stats.diagnostic. Details on these
since it often showed better forecast performance than more functions and their options can be found in the documentation and
complex, structural models. docstrings.
Using OLS to estimate this process, i.e. regressing yt on past The strong autocorrelation indicates that either our model is
yt−i , does not provide a consistent estimator. The process can misspecified or there is strong autocorrelation in the errors. If we
be consistently estimated using either conditional least squares, assume that the second is correct, then we can estimate the model
which in this case is a non-linear estimator, or conditional maxi- with GLSAR. As an example, let us assume we consider four lags
mum likelihood or with exact maximum likelihood. The difference in the autoregressive error.
between conditional methods and exact MLE is the same as de-
mod2 = sm.GLSAR(endog, exog, rho=4)
scribed before. statsmodels provides estimators for both methods res2 = mod2.iterative_fit()
in tsa.ARMA which will be described in more detail below.
Time series analysis is a vast field in econometrics with a iterative_fit alternates between estimating the autoregressive pro-
large range of models that extend on the basic linear models with cess of the error term using tsa.yule_walker, and feasible sm.GLS.
the assumption of normally distributed errors in many ways, and Looking at the estimation results shows two things, the parameter
provides a range of statistical tests to identify an appropriate model estimates are very different between OLS and GLS, and the
specification or test the underlying assumptions. autocorrelation in the residual is close to a random walk:
TIME SERIES ANALYSIS IN PYTHON WITH STATSMODELS 109
res1.params
#array([-1.502, 0.43 , 0.886])
res2.params
#array([-0.015, 0.01 , 0.034])
mod2.rho
#array([ 1.009, -0.003, 0.015, -0.028])
This indicates that the short run and long run dynamics might be
very different and that we should consider a richer dynamic model,
and that the variables might not be stationary and that there might
be unit roots.
>>> mod = tsa.ARMA(y) where θ is a normalizing constant such that the weights sum to
>>> res = arma_mod.fit(order=(2,1), trend='c', zero
... method='css-mle', disp=-1) − ∑ j=−K K b j
>>> arma_res.params θ=
array([ 4.0092, -0.7747, 0.2062, -0.5563]) 2K + 1
The estimation method, ’css-mle’, indicates that the starting pa- and
2π 2π
rameters from the optimization are to be obtained from the con- ω1 = , ω2 =
ditional sum of squares estimator and then the exact likelihood is PH PL
optimized. The exact likelihood is implemented using the Kalman with the periodicity of the low and high cut-off frequencies given
Filter. by PL and PH , respectively. Following Burns and Mitchell’s []
pioneering work which suggests that US business cycles last from
Filtering
1.5 to 8 years, Baxter and King suggest using PL = 6 and PH =
32 for quarterly data or 1.5 and 8 for annual data. The authors
We have recently implemented several filters that are commonly suggest setting the lead-lag length of the filter K to 12 for quarterly
used in economics and finance applications. The three most popu- data. The transformed series will be truncated on either end by K.
lar method are the Hodrick-Prescott, the Baxter-King filter, and the Naturally the choice of these parameters depends on the available
Christiano-Fitzgerald. These can all be viewed as approximations sample and the frequency band of interest.
of the ideal band-pass filter; however, discussion of the ideal band- The last filter that we currently provide is that of Christiano
pass filter is beyond the scope of this paper. We will [briefly review and Fitzgerald [CFitz]. The Christiano-Fitzgerald filter is again
the implementation details of each] give an overview of each of a weighted moving average. However, their filter is asymmetric
the methods and then present some usage examples. about t and operates under the (generally false) assumption that
The Hodrick-Prescott filter was proposed by Hodrick and yt follows a random walk. This assumption allows their filter to
Prescott [HPres], though the method itself has been in use across approximate the ideal filter even if the exact time-series model of
the sciences since at least 1876 [Stigler]. The idea is to separate a yt is not known. The implementation of their filter involves the
time-series yt into a trend τt and cyclical compenent ζt calculations of the weights in
yt = τt + ζt
yt∗ = B0 yt + B1 yt+1 + · · · + BT −1−t yT −1 + B̃T −t yT +
The components are determined by minimizing the following
quadratic loss function B1 yt−1 + · · · + Bt−2 y2 + B̃t−1 y1
T T for t = 3, 4, ..., T − 2, where
min ∑ ζt2 + λ
{τt } t
∑ [(τt − τt−1 ) − (τt−1 − τt−2 )]2 sin( jb) − sin( ja)
t=1
Bj = ,j≥1
where τt = yt − ζt and λ is the weight placed on the penalty πj
for roughness. Hodrick and Prescott suggest using λ = 1600 for b−a 2π 2π
quarterly data. Ravn and Uhlig [RUhlig] suggest λ = 6.25 and B0 = ,a = ,b =
π Pu PL
λ = 129600 for annual and monthly data, respectively. While
there are numerous methods for solving the loss function, our B̃T −t and B̃t−1 are linear functions of the B j ’s, and the values for
implementation uses scipy.sparse.linalg.spsolve to find the solu- t = 1, 2, T − 1, and T are also calculated in much the same way.
tion to the generalized ridge-regression suggested in Danthine and See the authors’ paper or our code for the details. PU and PL are
Girardine [DGirard]. as described above with the same interpretation.
Baxter and King [BKing] propose an approximate band-pass Moving on to some examples, the below demonstrates the API
filter that deals explicitly with the periodicity of the business cycle. and resultant filtered series for each method. We use series for
By applying their band-pass filter to a time-series yt , they produce unemployment and inflation to demonstrate 2. They are tradi-
a series yt∗ that does not contain fluctuations at frequencies higher tionally thought to have a negative relationship at business cycle
or lower than those of the business cycle. Specifically, in the frequencies.
time domain the Baxter-King filter takes the form of a symmetric >>> from scipy.signal import lfilter
moving average >>> data = sm.datasets.macrodata.load()
K >>> infl = data.data.infl[1:]
yt∗ = ∑ ak yt−k >>> # get 4 qtr moving average
k=−K >>> infl = lfilter(np.ones(4)/4, 1, infl)[4:]
>>> unemp = data.data.unemp[1:]
where ak = a−k for symmetry and ∑Kk=−K ak = 0 such that the
filter has trend elimination properties. That is, series that contain To apply the Hodrick-Prescott filter to the data 3, we can do
quadratic deterministic trends or stochastic processes that are >>> infl_c, infl_t = tsa.filters.hpfilter(infl)
integrated of order 1 or 2 are rendered stationary by application of >>> unemp_c, unemp_t = tsa.filters.hpfilter(unemp)
the filter. The filter weights ak are given as follows The Baxter-King filter 4 is applied as
a j = B j + θ for j = 0, ±1, ±2, . . . , ±K >>> infl_c = tsa.filters.bkfilter(infl)
>>> unemp_c = tsa.filters.bkfilter(unemp)
(ω2 − ω1 )
B0 = The Christiano-Fitzgerald filter is similarly applied 5
π
>>> infl_c, infl_t = tsa.filters.cfilter(infl)
1 >>> unemp_c, unemp_t = tsa.filters.cfilter(unemp)
Bj = (sin (ω2 j) − sin (ω1 j)) for j = 0, ±1, ±2, . . . , ±K
πj
TIME SERIES ANALYSIS IN PYTHON WITH STATSMODELS 111
Fig. 2: Unfiltered Inflation and Unemployment Rates 1959Q4-2009Q1 Fig. 5: Unfiltered Inflation and Unemployment Rates 1959Q4-2009Q1
Statistical Benchmarking
We also provide for another frequent need of those who work with
time-series data of varying observational frequency--that of bench-
marking. Benchmarking is a kind of interpolation that involves
creating a high-frequency dataset from a low-frequency one in a
consistent way. The need for benchmarking arises when one has
a low-frequency series that is perhaps annual and is thought to
be reliable, and the researcher also has a higher frequency series
that is perhaps quarterly or monthly. A benchmarked series is a
high-frequency series consistent with the benchmark of the low-
frequency series.
We have implemented Denton’s modified method. Origi-
nally proposed by Denton [Denton] and improved by Cholette
[Cholette]. To take the example of turning an annual series into
a quarterly one, Denton’s method entails finding a benchmarked
series Xt that solves
T
Xt Xt−1 2
Fig. 3: Unfiltered Inflation and Unemployment Rates 1959Q4-2009Q1
min ∑ −
{Xt } t It It−1
subject to
T
∑ Xt = Ay y = {1, . . . , β }
t=2
That is, the sum of the benchmarked series must equal the annual
benchmark in each year. In the above Ay is the annual benchmark
for year y, It is the high-frequency indicator series, and β is the
last year for which the annual benchmark is available. If T > 4β ,
then extrapolation is performed at the end of the series. To take
an example, given the US monthly industrial production index
and quarterly GDP data, from 2009 and 2010, we can construct a
benchmarked monthly GDP series
>>> iprod_m = np.array([ 87.4510, 86.9878, 85.5359,
84.7761, 83.8658, 83.5261, 84.4347,
85.2174, 85.7983, 86.0163, 86.2137,
86.7197, 87.7492, 87.9129, 88.3915,
88.7051, 89.9025, 89.9970, 90.7919,
90.9898, 91.2427, 91.1385, 91.4039,
92.5646])
Fig. 4: Unfiltered Inflation and Unemployment Rates 1959Q4-2009Q1 >>> gdp_q = np.array([14049.7, 14034.5, 14114.7,
14277.3, 14446.4, 14578.7, 14745.1,
14871.4])
112 PROC. OF THE 10th PYTHON IN SCIENCE CONF. (SCIPY 2011)
The forecast error variance decomposition can also be computed Fig. 8: VAR 5 step ahead forecast
and plotted like so
>>> res.fevd().plot()
TIME SERIES ANALYSIS IN PYTHON WITH STATSMODELS 113
R EFERENCES
[BKing] Baxter, M. and King, R.G. 1999. "Measuring Business Cycles:
Approximate Band-pass Filters for Economic Time Series." Re-
view of Economics and Statistics, 81.4, 575-93.
[Cholette] Cholette, P.A. 1984. "Adjusting Sub-annual Series to Yearly
Benchmarks." Survey Methodology, 10.1, 35-49.
[CFitz] Christiano, L.J. and Fitzgerald, T.J. 2003. "The Band Pass Filter."
International Economic Review, 44.2, 435-65.
[DGirard] Danthine, J.P. and Girardin, M. 1989. "Business Cycles in
Switzerland: A Comparative Study." European Economic Review
33.1, 31-50.
[Denton] Denton, F.T. 1971. "Adjustment of Monthly or Quarterly Series to
Annual Totals: An Approach Based on Quadratic Minimization."
Journal of the American Statistical Association, 66.333, 99-102.
[HPres] Hodrick, R.J. and Prescott, E.C. 1997. "Postwar US Business
Cycles: An Empirical Investigation." Journal of Money, Credit,
and Banking, 29.1, 1-16.
[RUhlig] Ravn, M.O and Uhlig, H. 2002. "On Adjusting the Hodrick-
Prescott Filter for the Frequency of Observations." Review of
Economics and Statistics, 84.2, 371-6.
Fig. 9: VAR Forecast error variance decomposition [Stigler] Stigler, S.M. 1978. "Mathematical Statistics in the Early States."
Annals of Statistics 6, 239-65,
[Lütkepohl] Lütkepohl, H. 2005. "A New Introduction to Multiple Time
Series Analysis"
Various tests such as testing Granger causality can be carried out
using the results object:
>>> res.test_causality('realinv', 'realcons')
H_0: ['realcons'] do not Granger-cause realinv
Conclusion: reject H_0 at 5.00% significance level
{'conclusion': 'reject',
'crit_value': 3.0112857238108273,
'df': (2, 579),
'pvalue': 3.7842822166888971e-10,
'signif': 0.05,
'statistic': 22.528593566083575}
Conclusions
statsmodels development over the last few years has been focused
on building correct and tested implementations of the standard
suite of econometric models available in other statistical com-
puting environments, such as R. However, there is still a long
road ahead before Python will be on the same level library-
wise with other computing environments focused on statistics
and econometrics. We believe that, given the wealth of powerful
scientific computing and interactive research tools coupled with
the excellent Python language, statsmodels can make Python
become a premier environment for doing applied statistics and
econometrics work. Future work will need to integrate all of these
tools to create a smooth and intuitive user experience compara-
ble to industry standard commercial and open source statistical
products.
We have built a foundational set of tools for several ubiquitous
classes of time series models which we hope will go a long way
toward meeting the needs of applied statisticians and econometri-
cians programming in Python.