(eBook PDF) Time Series: A Data Analysis Approach Using R instant download
(eBook PDF) Time Series: A Data Analysis Approach Using R instant download
https://ebooksecure.com/product/ebook-pdf-time-series-a-data-
analysis-approach-using-r/
http://ebooksecure.com/product/ebook-pdf-the-analysis-of-time-
series-an-introduction-with-r-7th-edition/
https://ebooksecure.com/download/real-time-data-acquisition-in-
human-physiology-real-time-acquisition-processing-and-
interpretation-a-matlab-based-approach-ebook-pdf/
http://ebooksecure.com/product/ebook-pdf-applied-survival-
analysis-using-r-use-r-1st-ed-2016-edition/
https://ebooksecure.com/download/quantitative-analysis-and-
modeling-of-earth-and-environmental-data-space-time-and-
spacetime-data-considerations-ebook-pdf/
(eBook PDF) SPSS Survival Manual: A Step by Step Guide
to Data Analysis using IBM SPSS 7th Edition
http://ebooksecure.com/product/ebook-pdf-spss-survival-manual-a-
step-by-step-guide-to-data-analysis-using-ibm-spss-7th-edition/
http://ebooksecure.com/product/ebook-pdf-a-handbook-of-
statistical-analyses-using-r-3rd-edition/
https://ebooksecure.com/download/spss-survival-manual-a-step-by-
step-guide-to-data-analysis-using-ibm-spss-ebook-pdf/
http://ebooksecure.com/product/ebook-pdf-qualitative-data-
analysis-a-methods-sourcebook-4th-edition/
http://ebooksecure.com/product/original-pdf-using-understanding-
mathematics-a-quantitative-reasoning-approach-7th-edition/
Contents
Preface xi
4 ARMA Models 67
4.1 Autoregressive Moving Average Models . . . . . . . . . . . . . . 67
4.2 Correlation Functions . . . . . . . . . . . . . . . . . . . . . . . . 76
4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5 ARIMA Models 99
5.1 Integrated Models . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 Building ARIMA Models . . . . . . . . . . . . . . . . . . . . . 104
5.3 Seasonal ARIMA Models . . . . . . . . . . . . . . . . . . . . . . 111
5.4 Regression with Autocorrelated Errors * . . . . . . . . . . . . . 122
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
vii
viii CONTENTS
6 Spectral Analysis and Filtering 129
6.1 Periodicity and Cyclical Behavior . . . . . . . . . . . . . . . . . 129
6.2 The Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . 137
6.3 Linear Filters * . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
References 253
Index 257
Preface
The goals of this book are to develop an appreciation for the richness and versatility
of modern time series analysis as a tool for analyzing data. A useful feature of
the presentation is the inclusion of nontrivial data sets illustrating the richness of
potential applications in medicine and in the biological, physical, and social sciences.
We include data analysis in both the text examples and in the problem sets.
The text can be used for a one semester/quarter introductory time series course
where the prerequisites are an understanding of linear regression and basic calculus-
based probability skills (primarily expectation). We assume general math skills at
the high school level (trigonometry, complex numbers, polynomials, calculus, and so
on).
All of the numerical examples use the R statistical package (R Core Team, 2018).
We do not assume the reader has previously used R, so Appendix A has an extensive
presentation of everything that will be needed to get started. In addition, there are
several simple exercises in the appendix that may help first-time users get more
comfortable with the software. We typically require students to do the R exercises as
the first homework assignment and we found this requirement to be successful.
Various topics are explained using linear regression analogies, and some estima-
tion procedures require techniques used in nonlinear regression. Consequently, the
reader should have a solid knowledge of linear regression analysis, including multiple
regression and weighted least squares. Some of this material is reviewed in Chapter 3
and Chapter 4.
A calculus-based introductory course on probability is an essential prerequisite.
The basics are covered briefly in Appendix B. It is assumed that students are familiar
with most of the content of that appendix and that it can serve as a refresher.
For readers who are a bit rusty on high school math skills, there are a number of
free books that are available on the internet (search on Wikibooks K-12 Mathematics).
For the chapters on spectral analysis (Chapter 6 and 7), a minimal knowledge of
complex numbers is needed, and we provide this material in Appendix C.
There are a few starred (*) items throughout the text. These sections and examples
are starred because the material covered in the section or example is not needed to
move on to subsequent sections or examples. It does not necessarily mean that the
material is more difficult than others, it simply means that the section or example
may be covered at a later time or skipped entirely without disrupting the continuity.
Chapter 8 is starred because the sections of that chapter are independent special
xi
xii PREFACE
topics that may be covered (or skipped) in any order. In a one-semester course, we
can usually cover Chapter 1 – Chapter 7 and at least one topic from Chapter 8.
Some homework problems have “hints” in the back of the book. The hints vary
in detail: some are nearly complete solutions, while others are small pieces of advice
or code to help start a problem.
The text is informally separated into four parts. The first part, Chapter 1 –
Chapter 3, is a general introduction to the fundamentals, the language, and the
methods of time series analysis. The second part, Chapter 4 – Chapter 5, presents
ARIMA modeling. Some technical details have been moved to Appendix D because,
while the material is not essential, we like to explain the ideas to students who know
mathematical statistics. For example, MLE is covered in Appendix D, but in the main
part of the text, it is only mentioned in passing as being related to unconditional least
squares. The third part, Chapter 6 – Chapter 7, covers spectral analysis and filtering.
We usually spend a small amount of class time going over the material on complex
numbers in Appendix C before covering spectral analysis. In particular, we make sure
that students see Section C.1 – Section C.3. The fourth part of the text consists of the
special topics covered in Chapter 8. Most students want to learn GARCH models, so
if we can only cover one section of that chapter, we choose Section 8.1.
Finally, we mention the similarities and differences between this text and Shumway
and Stoffer (2017), which is a graduate-level text. There are obvious similarities
because the authors are the same and we use the same R package, astsa, and con-
sequently the data sets in that package. The package has been updated for this text
and contains new and updated data sets and some updated scripts. We assume astsa
version 1.8.6 or later has been installed; see Section A.2. The mathematics level of
this text is more suited to undergraduate students and non-majors. In this text, the
chapters are short and a topic may be advanced over multiple chapters. Relative to the
coverage, there are more data analysis examples in this text. Each numerical example
has output and complete R code included, even if the code is mundane like setting up
the margins of a graphic or defining colors with the appearance of transparency. We
will maintain a website for the text at www.stat.pitt.edu/stoffer/tsda. A solutions manual
is available for instructors who adopt the book at www.crcpress.com.
1.1 Introduction
The analysis of data observed at different time points leads to unique problems that
are not covered by classical statistics. The dependence introduced by the sampling
data over time restricts the applicability of many conventional statistical methods that
require random samples. The analysis of such data is commonly referred to as time
series analysis.
To provide a statistical setting for describing the elements of time series data,
the data are represented as a collection of random variables indexed according to
the order they are obtained in time. For example, if we collect data on daily high
temperatures in your city, we may consider the time series as a sequence of random
variables, x1 , x2 , x3 , . . . , where the random variable x1 denotes the high temperature
on day one, the variable x2 denotes the value for the second day, x3 denotes the
value for the third day, and so on. In general, a collection of random variables, { xt },
indexed by t is referred to as a stochastic process. In this text, t will typically be
discrete and vary over the integers t = 0, ±1, ±2, . . . or some subset of the integers,
or a similar index like months of a year.
Historically, time series methods were applied to problems in the physical and
environmental sciences. This fact accounts for the engineering nomenclature that
permeates the language of time series analysis. The first step in an investigation
of time series data involves careful scrutiny of the recorded data plotted over time.
Before looking more closely at the particular statistical methods, we mention that
two separate, but not mutually exclusive, approaches to time series analysis exist,
commonly identified as the time domain approach (Chapter 4 and 5) and the frequency
domain approach (Chapter 6 and 7).
The following examples illustrate some of the common kinds of time series data as
well as some of the statistical questions that might be asked about such data.
1
2 1. TIME SERIES ELEMENTS
Johnson & Johnson Quarterly Earnings
1015
QEPS
5
0
1We assume astsa version 1.8.6 or later has been installed; see Section A.2.
1.2. TIME SERIES DATA 3
Global Warming
1.5
Land Surface
1.0 Sea Surface
Temperature Deviations
0.0 0.5
−0.5
Figure 1.2 Yearly average global land surface and ocean surface temperature deviations
(1880–2017) in ◦ C.
rt = ( xt − xt−1 )/xt−1 .
4 1. TIME SERIES ELEMENTS
16000 16000
14000 14000
12000 12000
10000 10000
8000 8000
Apr 20 2006 Nov 01 2007 Jun 01 2009 Jan 03 2011 Jul 02 2012 Jan 02 2014 Jul 01 2015
0.05 0.05
0.00 0.00
−0.05 −0.05
Apr 21 2006 Nov 01 2007 Jun 01 2009 Jan 03 2011 Jul 02 2012 Jan 02 2014 Jul 01 2015
Figure 1.3 Dow Jones Industrial Average (DJIA) trading days closings (top) and returns
(bottom) from April 20, 2006 to April 20, 2016.
r2 r3
log(1 + r ) = r − 2 + 3 −··· −1 < r ≤ 1,
we see that if r is very small, the higher-order terms will be negligible. Consequently,
because for financial data, xt /xt−1 ≈ 1, we have
log(1 + rt ) ≈ rt .
Note the financial crisis of 2008 in Figure 1.3. The data shown are typical of
return data. The mean of the series appears to be stable with an average return of
approximately zero, however, the volatility (or variability) of data exhibits clustering;
that is, highly volatile periods tend to be clustered together. A problem in the analysis
of these types of financial data is to forecast the volatility of future returns. Models
have been developed to handle these problems; see Chapter 8. The data set is an xts
data file, so it must be loaded.
1.2. TIME SERIES DATA 5
0.040.02
GDP Growth
0.00 −0.02
Figure 1.4 US GDP growth rate calculated using logs (–◦–) and actual values (+).
library(xts)
djia_return = diff(log(djia$Close))[-1]
par(mfrow=2:1)
plot(djia$Close, col=4)
plot(djia_return, col=4)
You can see a comparison of rt and log(1 + rt ) in Figure 1.4, which shows the
seasonally adjusted quarterly growth rate, rt , of US GDP compared to the version
obtained by calculating the difference of the logged data.
tsplot(diff(log(gdp)), type="o", col=4, ylab="GDP Growth") # diff-log
points(diff(gdp)/lag(gdp,-1), pch=3, col=2) # actual return
It turns out that many time series behave like this, so that logging the data and
then taking successive differences is a standard data transformation in time series
analysis. ♦
Example 1.4. El Niño – Southern Oscillation (ENSO)
The Southern Oscillation Index (SOI) measures changes in air pressure related to sea
surface temperatures in the central Pacific Ocean. The central Pacific warms every
three to seven years due to the ENSO effect, which has been blamed for various global
extreme weather events. During El Niño, pressure over the eastern and western Pacific
reverses, causing the trade winds to diminish and leading to an eastward movement
of warm water along the equator. As a result, the surface waters of the central and
eastern Pacific warm with far-reaching consequences to weather patterns.
Figure 1.5 shows monthly values of the Southern Oscillation Index (SOI) and
associated Recruitment (an index of the number of new fish). Both series are for
a period of 453 months ranging over the years 1950–1987. They both exhibit an
obvious annual cycle (hot in the summer, cold in the winter), and, though difficult to
see, a slower frequency of three to seven years. The study of the kinds of cycles and
6 1. TIME SERIES ELEMENTS
Southern Oscillation Index
1.0
COOL
0.0
WARM
−1.0
Recruitment
100
60
0 20
Figure 1.5 Monthly SOI and Recruitment (estimated new fish), 1950–1987.
their strengths is the subject of Chapter 6 and 7. The two series are also related; it is
easy to imagine that fish population size is dependent on the ocean temperature.
The following R code will reproduce Figure 1.5:
par(mfrow = c(2,1))
tsplot(soi, ylab="", xlab="", main="Southern Oscillation Index", col=4)
text(1970, .91, "COOL", col="cyan4")
text(1970,-.91, "WARM", col="darkmagenta")
tsplot(rec, ylab="", main="Recruitment", col=4)
♦
Example 1.5. Predator–Prey Interactions
While it is clear that predators influence the numbers of their prey, prey affect the
number of predators because when prey become scarce, predators may die of star-
vation or fail to reproduce. Such relationships are often modeled by the Lotka–
Volterra equations, which are a pair of simple nonlinear differential equations (e.g.,
see Edelstein-Keshet, 2005, Ch. 6).
One of the classic studies of predator–prey interactions is the snowshoe hare and
lynx pelts purchased by the Hudson’s Bay Company of Canada. While this is an
indirect measure of predation, the assumption is that there is a direct relationship
between the number of pelts collected and the number of hare and lynx in the wild.
These predator–prey interactions often lead to cyclical patterns of predator and prey
abundance seen in Figure 1.6. Notice that the lynx and hare population sizes are
asymmetric in that they tend to increase slowly and decrease quickly (%↓).
The lynx prey varies from small rodents to deer, with the snowshoe hare being
1.2. TIME SERIES DATA 7
150
Hare
Lynx
( × 1000)
100
Number
50 0
Figure 1.6 Time series of the predator–prey interactions between the snowshoe hare and lynx
pelts purchased by the Hudson’s Bay Company of Canada. It is assumed there is a direct
relationship between the number of pelts collected and the number of hare and lynx in the wild.
its overwhelmingly favored prey. In fact, lynx are so closely tied to the snowshoe
hare that its population rises and falls with that of the hare, even though other food
sources may be abundant. In this case, it seems reasonable to model the size of the
lynx population in terms of the snowshoe population. This idea is explored further in
Example 5.17.
Figure 1.6 may be reproduced as follows.
culer = c(rgb(.85,.30,.12,.6), rgb(.12,.67,.86,.6))
tsplot(Hare, col = culer[1], lwd=2, type="o", pch=0,
ylab=expression(Number~~~(""%*% 1000)))
lines(Lynx, col=culer[2], lwd=2, type="o", pch=2)
legend("topright", col=culer, lty=1, lwd=2, pch=c(0,2),
legend=c("Hare", "Lynx"), bty="n")
♦
Example 1.6. fMRI Imaging
Often, time series are observed under varying experimental conditions or treatment
configurations. Such a set of series is shown in Figure 1.7, where data are collected
from various locations in the brain via functional magnetic resonance imaging (fMRI).
In fMRI, subjects are put into an MRI scanner and a stimulus is applied for a
period of time, and then stopped. This on-off application of a stimulus is repeated
and recorded by measuring the blood oxygenation-level dependent (bold) signal
intensity, which measures areas of activation in the brain. The bold contrast results
from changing regional blood concentrations of oxy- and deoxy- hemoglobin.
The data displayed in Figure 1.7 are from an experiment that used fMRI to
examine the effects of general anesthesia on pain perception by comparing results
from anesthetized volunteers while a supramaximal shock stimulus was applied. This
stimulus was used to simulate surgical incision without inflicting tissue damage. In
8 1. TIME SERIES ELEMENTS
Cortex
0.60.2
BOLD
−0.2−0.6
0 20 40 60 80 100 120
Thalamus
0.60.2
BOLD
−0.2−0.6
0 20 40 60 80 100 120
Cerebellum
0.60.2
BOLD
−0.2−0.6
0 20 40 60 80 100 120
Time (1 pt = 2 sec)
Figure 1.7 fMRI data from two locations in the cortex, the thalamus, and the cerebellum;
n = 128 points, one observation taken every 2 seconds. The boxed line represents the
presence or absence of the stimulus.
this example, the stimulus was applied for 32 seconds and then stopped for 32 seconds,
so that the signal period is 64 seconds. The sampling rate was one observation every
2 seconds for 256 seconds (n = 128).
Notice that the periodicities appear strongly in the motor cortex series but seem to
be missing in the thalamus and perhaps in the cerebellum. In this case, it is of interest
to statistically determine if the areas in the thalamus and cerebellum are actually
responding to the stimulus. Use the following R commands for the graphic:
par(mfrow=c(3,1))
culer = c(rgb(.12,.67,.85,.7), rgb(.67,.12,.85,.7))
u = rep(c(rep(.6,16), rep(-.6,16)), 4) # stimulus signal
tsplot(fmri1[,4], ylab="BOLD", xlab="", main="Cortex", col=culer[1],
ylim=c(-.6,.6), lwd=2)
lines(fmri1[,5], col=culer[2], lwd=2)
lines(u, type="s")
tsplot(fmri1[,6], ylab="BOLD", xlab="", main="Thalamus", col=culer[1],
ylim=c(-.6,.6), lwd=2)
lines(fmri1[,7], col=culer[2], lwd=2)
lines(u, type="s")
1.3. TIME SERIES MODELS 9
tsplot(fmri1[,8], ylab="BOLD", xlab="", main="Cerebellum",
col=culer[1], ylim=c(-.6,.6), lwd=2)
lines(fmri1[,9], col=culer[2], lwd=2)
lines(u, type="s")
mtext("Time (1 pt = 2 sec)", side=1, line=1.75)
♦
The primary objective of time series analysis is to develop mathematical models that
provide plausible descriptions for sample data, like that encountered in the previous
section.
The fundamental visual characteristic distinguishing the different series shown in
Example 1.1 – Example 1.6 is their differing degrees of smoothness. A parsimonious
explanation for this smoothness is that adjacent points in time are correlated, so
the value of the series at time t, say, xt , depends in some way on the past values
xt−1 , xt−2 , . . .. This idea expresses a fundamental way in which we might think
about generating realistic looking time series.
Example 1.7. White Noise
A simple kind of generated series might be a collection of uncorrelated random
variables, wt , with mean 0 and finite variance σw2 . The time series generated from
uncorrelated variables is used as a model for noise in engineering applications where it
is called white noise; we shall sometimes denote this process as wt ∼ wn(0, σw2 ). The
designation white originates from the analogy with white light (details in Chapter 6).
A special version of white noise that we use is when the variables are independent
and identically distributed normals, written wt ∼ iid N(0, σw2 ).
The upper panel of Figure 1.8 shows a collection of 500 independent standard
normal random variables (σw2 = 1), plotted in the order in which they were drawn. The
resulting series bears a resemblance to portions of the DJIA returns in Figure 1.3. ♦
If the stochastic behavior of all time series could be explained in terms of the
white noise model, classical statistical methods would suffice. Two ways of intro-
ducing serial correlation and more smoothness into time series models are given in
Example 1.8 and Example 1.9.
Example 1.8. Moving Averages, Smoothing and Filtering
We might replace the white noise series wt by a moving average that smoothes the
series. For example, consider replacing wt in Example 1.7 by an average of its current
value and its immediate two neighbors in the past. That is, let
1
w t −1 + w t + w t +1 , (1.1)
vt = 3
which leads to the series shown in the lower panel of Figure 1.8. This series is much
smoother than the white noise series and has a smaller variance due to averaging.
It should also be apparent that averaging removes some of the high frequency (fast
10 1. TIME SERIES ELEMENTS
white noise
3
1
w
−1
−3
Figure 1.8 Gaussian white noise series (top) and three-point moving average of the Gaussian
white noise series (bottom).
successively for t = 1, 2, . . . , 250. The resulting output series is shown in Figure 1.9.
Equation (1.2) represents a regression or prediction of the current value xt of a
1.3. TIME SERIES MODELS 11
autoregression
5
x
0 −5
time series as a function of the past two values of the series, and, hence, the term
autoregression is suggested for this model. A problem with startup values exists here
because (1.2) also depends on the initial conditions x0 and x−1 , but for now we set
them to zero. We can then generate data recursively by substituting into (1.2). That
is, given w1 , w2 , . . . , w250 , we could set x−1 = x0 = 0 and then start at t = 1:
x1 = 1.5x0 − .75x−1 + w1 = w1
x2 = 1.5x1 − .75x0 + w2 = 1.5w1 + w2
x3 = 1.5x2 − .75x1 + w3
x4 = 1.5x3 − .75x2 + w4
and so on. We note the approximate periodic behavior of the series, which is similar
to that displayed by the SOI and Recruitment in Figure 1.5 and some fMRI series
in Figure 1.7. This particular model is chosen so that the data have pseudo-cyclic
behavior of about 1 cycle every 12 points; thus 250 observations should contain
about 20 cycles. This autoregressive model and its generalizations can be used as an
underlying model for many observed series and will be studied in detail in Chapter 4.
One way to simulate and plot data from the model (1.2) in R is to use the following
commands. The initial conditions are set equal to zero so we let the filter run an extra
50 values to avoid startup problems.
set.seed(90210)
w = rnorm(250 + 50) # 50 extra to avoid startup problems
x = filter(w, filter=c(1.5,-.75), method="recursive")[-(1:50)]
tsplot(x, main="autoregression", col=4)
♦
Example 1.10. Random Walk with Drift
A model for analyzing a trend such as seen in the global temperature data in Figure 1.2,
is the random walk with drift model given by
x t = δ + x t −1 + w t (1.3)
12 1. TIME SERIES ELEMENTS
random walk
80
60
40
20
0
Figure 1.10 Random walk, σw = 1, with drift δ = .3 (upper jagged line), without drift, δ = 0
(lower jagged line), and dashed lines showing the drifts.
for t = 1, 2, . . .; either use induction, or plug (1.4) into (1.3) to verify this statement.
Figure 1.10 shows 200 observations generated from the model with δ = 0 and .3,
and with standard normal noise. For comparison, we also superimposed the straight
lines δt on the graph. To reproduce Figure 1.10 in R use the following code (notice
the use of multiple commands per line using a semicolon).
set.seed(314159265) # so you can reproduce the results
w = rnorm(200); x = cumsum(w) # random walk
wd = w +.3; xd = cumsum(wd) # random walk with drift
tsplot(xd, ylim=c(-2,80), main="random walk", ylab="", col=4)
abline(a=0, b=.3, lty=2, col=4) # plot drift
lines(x, col="darkred")
abline(h=0, col="darkred", lty=2)
♦
Example 1.11. Signal Plus Noise
Many realistic models for generating time series assume an underlying signal with
some consistent periodic variation contaminated by noise. For example, it is easy to
detect the regular cycle fMRI series displayed on the top of Figure 1.7. Consider the
model
xt = 2 cos(2π t+5015 ) + wt (1.5)
for t = 1, 2, . . . , 500, where the first term is regarded as the signal, shown in the
Another Random Scribd Document
with Unrelated Content
[1]Este conto foi publicado, pela primeira vez, na Epocha, n. 1, de 14 de
Novembro de 1875. Trazia o pseudonymo de Manassés, com que assignei outros
artigos daquella folha ephemera. O redactor principal era um espirito eminente,
que a politica veiu tomar ás lettras: Joaquim Nabuco. Posso dizel-o sem
indiscrição. Eramos poucos e amigos. O programma era não ter programma, como
declarou o artigo inicial, ficando a cada redactor plena liberdade de opinião, pela
qual respondia exclusivamente. O tom (feita a natural reserva da parte de um
collaborador) era elegante, litterario, attico. A folha durou quatro numeros.
FIM DA CHINELA TURCA
NA ARCA
CAPITULO A
CAPITULO B
D. BENEDICTA
UM RETRATO
II
III
IV
O SEGREDO DO BONZO[1]