Time Series Analysis (TSA) - Tutorial
Time Series Analysis (TSA) - Tutorial
D G Rossiter
Cornell University, Section of Soil & Crop Sciences
ISRIC–World Soil Information
W¬ 'f0 ffb
March 3, 2020
è ï ôå
“Take time to sharpen your axe before you start to chop
firewood.”
Contents
1 Introduction 1
5 Intervention analysis 98
5.1 A known intervention . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8 Simulation 125
8.1 AR models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.2 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
References 131
ii
1 Introduction
1
2 Loading and examining a time series
These also illustrate some of the problems with importing external datasets
into R and putting data into a form suitable for time-series analysis.
All the datasets in this exercise are assumed to be stored in the ds_tsa
“datasets” subdirectory, under the directory where this tutorial is stored.
Q1 : What are some things that a water manager would want to know
about this time series? Jump to A1 •
Task 1 : Start R and switch to the directory where the example datasets
are stored. •
34.36
34.45
34.7
...
55.96
55.55
54.83
Q2 : Can you tell from this file that it is a time series? Jump to A2 •
5
kindly provided by colleagues in the University of Twente/faculty ITC’s Water Re-
sources department, https://www.itc.nl/WRS
2
Task 3 : Read this file into R and examine its structure. •
Using the scan function to read a file into a vector:
gw <- scan("./ds_tsa/anatolia_hati.txt")
str(gw)
Task 4 : Convert the vector of measurements for this well into a time
series and examine its structure and attributes. •
The ts function converts a vector into a time series; it has several argu-
ments, of which enough must be given to specify the series:
Only one of frequency or deltat should be provided, they are two ways
to specify the same information. The ending date end is optional if either
frequency or deltat are given, because the end will just be when the
vector ends.
In this example we know from the metadata that the series begins in
January 1975 and ends in December 2004; it is a monthly series. The
simplest way to specify it is by starting date and frequency.
After the series is established, we examine its structure with str and
attributes with attributes.
gw <- ts(gw, start=1975, frequency=12)
str(gw)
## Time-Series [1:360] from 1975 to 2005: 34.4 34.5 34.7 34.8 34.9 ...
attributes(gw)
## $tsp
## [1] 1975.000 2004.917 12.000
##
## $class
## [1] "ts"
start(gw)
## [1] 1975 1
end(gw)
## [1] 2004 12
3
In the above example we also show the start and end functions to show
the starting and ending dates in a time series.
Task 5 : Print the time series; also show the the time associated with
each measurement, and the position of each observation in the cycle. •
The generic print method specializes into print.ts to show the actual
values; time shows the time associated with each measurement, and
cycle shows the position of each observation in the cycle.
print(gw)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct
1975 34.36 34.45 34.70 34.80 34.88 35.16 35.60 35.86 35.86 35.70
1976 35.22 35.18 34.98 35.20 35.51 35.32 34.45 34.54 34.39 34.18
...
2003 53.32 52.48 51.37 51.07 50.71 52.78 54.35 55.46 56.52 55.70
2004 53.08 52.41 51.71 51.66 52.84 54.11 55.28 56.11 57.02 55.96
Nov Dec
1975 35.48 35.28
1976 33.92 33.73
...
2003 54.47 54.21
2004 55.55 54.83
Note how the month names are automatically assigned. From the ts doc-
umentation: “Values of 4 and 12 are assumed in print methods to imply a
quarterly and monthly series respectively.” Specifically print.ts makes
this assumption.
time(gw)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1975 1 2 3 4 5 6 7 8 9 10 11 12
1976 1 2 3 4 5 6 7 8 9 10 11 12
...
2003 1 2 3 4 5 6 7 8 9 10 11 12
2004 1 2 3 4 5 6 7 8 9 10 11 12
4
Q4 : What are the units of each of these? Jump to A4 •
## [1] 12
deltat(gw)
## [1] 0.08333333
par(mfrow=c(1,3))
# pdf("AnatoliaWell1.pdf", width=10, height=5)
plot(gw, ylab="Depth to water table (m)", main="Anatolia Well 1")
# dev.off()
plot(gw, type="o", pch=20, cex=0.6, col="blue",
ylab="Depth to water table (m)", main="Anatolia Well 1")
plot(gw, type="h", col="blue", ylab="Depth to water table (m)",
main="Anatolia Well 1")
par(mfrow=c(1,1))
●
● ●
●
●
●
●
●
● ●
● ● ●
55
55
55
● ● ●
● ●
●
●
●
● ● ●
●
●
●
● ●
● ●
● ●
● ● ●
●●
●
● ● ●●
● ●
● ●
● ●
●
50
50
50
●
●● ●
●
●
●●●● ●
● ●
● ● ●
●
● ●
Depth to water table (m)
● ●
● ●
● ●
●● ●●
●
●
●● ● ●
●●
●
● ●● ● ●
● ●
●
● ● ●
●
●
●● ●
●
● ●●
● ● ●
● ● ●
45
45
45
● ●
●● ●
● ● ● ●
● ● ● ●
● ●
● ● ● ●
● ● ●
● ●
● ●
● ●
●
●● ● ●
●
● ● ● ●
● ● ●
●
● ●
●
● ●
● ●● ●
●
● ●
● ●●
● ●●● ●
● ● ●
● ● ●
● ● ●
● ● ●●
● ● ● ● ● ●●
● ● ●
● ● ● ●
40
40
40
●● ● ● ●●
● ● ●
● ●●
● ● ●●
● ●
● ● ● ● ●
● ●
● ●
● ●
●●
●
●●
●
●
● ●
●
●
● ● ●●
●
● ● ●
●●
●● ●
●
● ●
35
35
35
●●●●● ●
●
● ● ●●●
●●
●● ●●●
●
● ●
●
● ●
● ● ●●
● ●●
● ● ● ●●●●
●
● ● ●
● ●● ●
● ●
● ●● ● ●
● ●
●●●●● ● ● ●
●●
●
● ● ● ●
● ● ● ●
● ●●● ● ●●
● ● ●
● ● ●●
●●● ●
30
30
30
●● ●
●●
●●
1975 1980 1985 1990 1995 2000 2005 1975 1980 1985 1990 1995 2000 2005 1975 1980 1985 1990 1995 2000 2005
5
Task 8 : Plot three cycles of the time series, from the shallowest depth
at the end of the winter rains (April) beginning in 1990, to see annual
behaviour. •
We use the window function to extract just part of the series. The start
and end can have both a cycle (here, year) and position in cycle (here,
month), connected with the c ‘catentate’ function:
window(gw, start=c(1990,4), end=c(1993,3))
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1990 39.07 40.16 42.50 43.20 44.11 43.79 43.34
## 1991 41.44 40.85 40.74 40.52 41.27 42.11 43.88 45.09 45.06 44.22
## 1992 42.42 41.46 40.82 40.82 41.73 42.19 43.26 44.49 44.28 43.54
## 1993 42.13 41.66 41.28
## Nov Dec
## 1990 42.61 41.92
## 1991 43.16 42.48
## 1992 42.05 42.48
## 1993
Anatolia Well 1
● ●
45
●
● ●
●
44
●
●
●
Depth to water table (m)
● ●
● ●
43
●
● ● ● ●
● ● ●
42
●
●
● ●
● ●
● ●
41
● ● ●
●
●
●
40
●
39
Time
This region of Turkey has a typical Mediterranean climate: hot and dry
in the summer, cool and moist in the winter. These wells are used for
irrigation in the summer months.
The window function may also be used to extract a single element, by
specifying the same start and end date:
window(gw, start=c(1990,1), end=c(1990,1))
6
## Jan
## 1990 39.95
Task 9 : Compute and view the difference for lag 1 (one month), lag 2
(two months), and lag 12 (one year), for the period 1990 – 1992. •
The diff function computes differences, by default for successive mea-
surements; the lag argument specifies different lags (intervals between
measurements):
window(gw, 1990, c(1992,12))
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1990 39.95 39.89 39.38 39.07 40.16 42.50 43.20 44.11 43.79 43.34
## 1991 41.44 40.85 40.74 40.52 41.27 42.11 43.88 45.09 45.06 44.22
## 1992 42.42 41.46 40.82 40.82 41.73 42.19 43.26 44.49 44.28 43.54
## Nov Dec
## 1990 42.61 41.92
## 1991 43.16 42.48
## 1992 42.05 42.48
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1990 -0.06 -0.51 -0.31 1.09 2.34 0.70 0.91 -0.32 -0.45
## 1991 -0.48 -0.59 -0.11 -0.22 0.75 0.84 1.77 1.21 -0.03 -0.84
## 1992 -0.06 -0.96 -0.64 0.00 0.91 0.46 1.07 1.23 -0.21 -0.74
## Nov Dec
## 1990 -0.73 -0.69
## 1991 -1.06 -0.68
## 1992 -1.49 0.43
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1990 -0.57 -0.82 0.78 3.43 3.04 1.61 0.59 -0.77
## 1991 -1.17 -1.07 -0.70 -0.33 0.53 1.59 2.61 2.98 1.18 -0.87
## 1992 -0.74 -1.02 -1.60 -0.64 0.91 1.37 1.53 2.30 1.02 -0.95
## Nov Dec
## 1990 -1.18 -1.42
## 1991 -1.90 -1.74
## 1992 -2.23 -1.06
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1991 1.49 0.96 1.36 1.45 1.11 -0.39 0.68 0.98 1.27 0.88
## 1992 0.98 0.61 0.08 0.30 0.46 0.08 -0.62 -0.60 -0.78 -0.68
## Nov Dec
## 1991 0.55 0.56
## 1992 -1.11 0.00
7
month within a year? Are they? Jump to A9 •
Q10 : Do you expect the one-month differences to be the same for the
same month interval in different years? Are they? Jump to A10 •
Task 10 : Compute the first, second and third order differences of the
series from 1990 – 1993. •
diff(window(gw, 1990, c(1993,12)), lag=12, differences=1)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1991 1.49 0.96 1.36 1.45 1.11 -0.39 0.68 0.98 1.27 0.88
## 1992 0.98 0.61 0.08 0.30 0.46 0.08 -0.62 -0.60 -0.78 -0.68
## 1993 -0.29 0.20 0.46 0.47 0.60 1.21 1.62 1.24 1.50 1.62
## Nov Dec
## 1991 0.55 0.56
## 1992 -1.11 0.00
## 1993 2.57 1.45
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1992 -0.51 -0.35 -1.28 -1.15 -0.65 0.47 -1.30 -1.58 -2.05 -1.56
## 1993 -1.27 -0.41 0.38 0.17 0.14 1.13 2.24 1.84 2.28 2.30
## Nov Dec
## 1992 -1.66 -0.56
## 1993 3.68 1.45
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1993 -0.76 -0.06 1.66 1.32 0.79 0.66 3.54 3.42 4.33 3.86
## Nov Dec
## 1993 5.34 2.01
Task 11 : Plot the first differences for the whole time series. •
plot(diff(gw), ylab="One-month differences in groundwater depth (m)",
main="Anatolia well 1")
8
Anatolia well 1
2
0
−2
−4
−6
Time
This gives us a very different view of the sequence, compared to the time
series itself.
Q12 : What are the outstanding features of the first differences? Jump
to A12 •
Task 12 : Identify the months with the largest extractions and recharges.
•
The which function identifies elements in a vector which meet some cri-
terion. The which.min and which.max are shorthand for finding the
minimum and maximum.
We first find the most extreme extractions and recharge, and then all of
these where the level changed by more than 2 m in one month, in either
direction:
i <- which.min(diff(gw))
diff(gw)[i]
## [1] -7.36
time(gw)[i]
## [1] 1988.167
cycle(gw)[i]
## [1] 3
i <- which.max(diff(gw))
diff(gw)[i]
## [1] 2.74
9
time(gw)[i]
## [1] 2000.417
cycle(gw)[i]
## [1] 6
time(gw)[i]
cycle(gw)[i]
## [1] 3 5 1 7 6 7 5 5
Note the use of the time function on the series to get the dates corre-
sponding to the selected measurements; the resulting vector of dates is
subsetted with the [] indexing operator. Similarly, the cycle function
is used to display the position in the cycle, here the month.
The extreme recharge (about 7.5 m) in March 1988 draws our attention;
is this correct, given than no other month in the time series had more
than about 2.5 m recharge?
First, we take a closer look at the measurements for this and surrounding
years:
plot(window(gw, start=1986, end=1990), ylab="Groundwater depth (m)",
main="Anatolia well 1", type="h", col="blue")
lines(window(gw, start=1986, end=1990), lty=2)
10
Anatolia well 1
46
Groundwater depth (m)
44
42
40
Time
Q13 : Is the pattern for Spring 1987 – Spring 1988 consistent with the
surrounding years? Jump to A13 •
Here are the actual measurements for the relevant time period:
window(gw, start=c(1987,3), end=c(1988,6))
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1987 39.64 39.43 40.12 41.39 42.73 43.58 43.49 44.18
## 1988 46.06 46.42 46.78 39.42 39.96 40.58
## Nov Dec
## 1987 44.94 45.53
## 1988
Q15 : What could be some research questions for this time series?
Jump to A15 •
11
This is a nice format to view the data but not to analyze as a time series.
First, to quote the R Data Import/Export manual [17, §8]:
Note: However, recently several packages have been written which work
well to exchange Excel files and R data structures, for example the readxl
package, which has an excel_sheets function to list the sheets in an
Excel workbook, and a read_excel function to read an Excel sheet as a
data frame.
BahirDar,,,,,,,,,,,,,
6
using the Numbers application on Mac OS X 10.6
12
YEAR,DATE,JAN,FEB,MAR,APRIL,MAY,JUNE,JULY,AUG,SEP,OCT,NOV,DEC
1981,1,0,0,0,0,0,0.8,26.3,14.9,13.4,0,1.4,0
,2,0,0,0,0,0,0,12.9,31.3,0.3,0.9,0,0
...
,28,0.0,0.0,0.0,0.0,28.0,33.2,28.1,0.9,36.0,0.5,0,
,29,0.0,,0.0,0.0,0.5,24.0,18.5,2.8,6.8,32.7,0,
,30,0.0,,0.0,0.0,0.0,3.6,32.1,11.5,1.0,3,0,
,31,0.0,,0.0,,15.0,,85.7,0.6,,0,,
Task 14 : Read the CSV file into an R object and examine its structure.
•
We use the very flexible read.csv function, which is a version of the
more general read.table method. These have quite some useful op-
tional arguments (see ?read.table for details); here we use:
• skip=1 to skip over the first line, which is just the station name;
• header=T to specify that the first line read (after skipping) contains
the variable names, here the months;
• colClasses to specify the data type of each field (column); this
is because we want to treat all the rainfall amounts as character
strings to fix the “trace amount” entries specified with the string
TR;
• na.strings="N.A" to specify that any string that is identically N.A
is a missing value; note blanks are also considered missing.
• blank.lines.skip=T to skip blank lines.
Note that optional arguments sep and dec can be used if the separa-
tor between fields is not the default “,” or the decimal point is not the
default “.”.
tana <- read.csv("./ds_tsa/Tana_Daily.csv", skip=1, header=T,
colClasses=c(rep("integer",2), rep("character",12)),
blank.lines.skip=T,na.strings=c("N.A","NA"," "))
str(tana)
tana[1:35,]
## YEAR DATE JAN FEB MAR APRIL MAY JUNE JULY AUG SEP OCT NOV DEC
## 1 1981 1 0 0 0 0 0 0.8 26.3 14.9 13.4 0 1.4 0
13
## 2 NA 2 0 0 0 0 0 0 12.9 31.3 0.3 0.9 0 0
## 3 NA 3 0 0 0 0 0 0 8.9 0.4 1.5 17.6 0 0
## 4 NA 4 0 0 0 0 0 0 29.6 27.6 0.4 0 0 0
## 5 NA 5 0 0 0 0 0 10.8 16.2 7.8 9.5 0.3 0 0
## 6 NA 6 0 0 0 0 0 5.2 5.3 16.2 5.5 8.4 0.9 0
## 7 NA 7 0 0 0 0 0 0 7.3 2.9 0.4 3.7 0 0
## 8 NA 8 0 0 0 0 3.7 0 108.7 20.1 1.5 0 0 0
## 9 NA 9 0 0 0 0 1.2 0.2 17.6 8.1 4.9 26.8 0 0
## 10 NA 10 0 0 0 0 0 0 6 0 0 0 0 0
## 11 NA 11 0 0 0 0 0 0 7 1 2.7 0 0 0
## 12 NA 12 0 0 0 0 0 0.2 0 58.9 3.8 0 0 0
## 13 NA 13 0 0 0 0 0 0 0 0.7 8.8 0 0 0
## 14 NA 14 0 0 0 0 0 10.3 2.4 1.5 0 0 0 0
## 15 NA 15 0 0 0 0 7.1 0.2 0.8 30.3 0 0 0 0
## 16 NA 16 0 0 0 0 0 0 28.9 1.1 6.3 0 0 0
## 17 NA 17 0 0 0 0 0 2.2 30 6.5 1.6 0 0 0
## 18 NA 18 0 0 0 0 2.7 0 11.4 23.6 8.8 0 0 0
## 19 NA 19 0 0 0 0.3 1.3 0 21.4 27.7 11.5 0 0 0
## 20 NA 20 0 0 0 0 0 6.9 25.2 0 5.8 0 0 0
## 21 NA 21 0 0 0 0 4.4 3.5 7.5 4.1 3.5 0 0 0
## 22 NA 22 0 0 0 14.7 8.6 19.9 20.7 25 32.3 0 0 0
## 23 NA 23 0 0 0 0.4 0 0.9 21.5 11.3 6.7 0 1.7 0
## 24 NA 24 0 0 0 52.8 0 4.4 48.9 1.2 0 0 3.8 0
## 25 NA 25 0 0 0 0 0 0 21.8 23.5 0.7 0 0.2 0
## 26 NA 26 0 0 0 0.3 16.3 0 32.6 25.8 0.4 0 0 0
## 27 NA 27 0 0 0 0 0 0.5 26.6 0.4 0 0 0 0
## 28 NA 28 0 0 0 0 0 0 0 1.4 0 0 0 0
## 29 NA 29 0 0 0 0 0 11.2 8 2 0 0 0
## 30 NA 30 0 0 0 0 0 64.8 0 0 0 0 0
## 31 NA 31 0 0 0 14.9 1.4 0 0
## 32 NA NA
## 33 1982 1 0 0 0 0 0 0 0.9 6.5 4.9 2.2 0 0
## 34 NA 2 0 0 0 0 0 0 3 11.3 0.2 0.3 0 0
## 35 NA 3 0 0 30 0 0 0.3 11.8 24.6 5.7 <NA> 0 0
We can see that each year has 31 lines followed by one lines with only two
NA missing values. Only the first line of each year has the year number
in the first field.
The read.csv function creates a so-called dataframe, which is a matrix
with named fields (columns).
There are many missing values, which can be identified with is.na func-
tion; the sum function then shows how many in total. We also compute
the number of blank records; the na.rm argument is needed to remove
missing values before summing:
sum(is.na(tana[,3:14]))
## [1] 170
sum(tana[,3:14]=="",na.rm=TRUE)
## [1] 457
Note that the null values are not missing, they are for days that do not
exist (e.g. 30-February).
We can see all the values with the unique function, and sort these for
easy viewing with sort. To get all the months as one vector we stack
14
columns 3 through 14 with the stack function:
head(sort(unique(stack(tana[,3:14])$values)))
tail(sort(unique(stack(tana[,3:14])$values)))
Q17 : What are the meanings of the "" (empty string), 0, 0.0, and "TR"
(also written "tr") values? Jump to A17 •
Let’s see how many of each of the problematic values there are, and how
many zeroes:
sum(tana[,3:14]=="TR", na.rm=TRUE)
## [1] 69
sum(tana[,3:14]=="tr", na.rm=TRUE)
## [1] 2
sum(tana[,3:14]=="0.01", na.rm=TRUE)
## [1] 14
sum(tana[,3:14]=="0", na.rm=TRUE)
## [1] 4920
The trace values are conventionally set to half the measurement preci-
sion, or one order of magnitude smaller, or (since they have very little
effect on rainfall totals) to zero.
Task 15 : Set the trace values and any measurements below 0.1 to
zero. •
For this we use the very useful recode function in the car “Companion
to Applied Regression” package from John Fox [9]. The require function
loads the library if it is not already loaded.
require(car)
for (i in 3:14) {
tana[,i] <- recode(tana[,i], "c('TR','tr','0.01')='0'")
}
head(sort(unique(stack(tana[,3:14])$values)),12)
## [1] "" "0" "0.0" "0.1" "0.2" "0.3" "0.4" "0.5" "0.6" "0.7" "0.8"
## [12] "0.9"
15
tail(sort(unique(stack(tana[,3:14])$values)),12)
## [1] "9.0" "9.1" "9.2" "9.3" "9.4" "9.5" "9.6" "9.7" "9.8"
## [10] "9.9" "90.5" "95.5"
## [1] 0
sum(tana[,3:14]=="tr", na.rm=TRUE)
## [1] 0
sum(tana[,3:14]=="0.01", na.rm=TRUE)
## [1] 0
sum(tana[,3:14]=="0", na.rm=TRUE)
## [1] 5005
7
Apologies to anyone born on this date!
16
};
str(tana.ppt)
## chr [1:9490] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" ...
## [1] 26
l ll l l llllllll l l
mm
60
20
0
Time
There are six years with a few missing observations, and a long series of
missing observations in 1991.
To zoom in on the within-year structure, we display one-year windows
for the years with missing values, with these highlighted, and the most
recent year. The points function is used to place points on top of a bar
graph created with the plot function (with the optional type="h" argu-
ment). To compare several years side-by-side, we compute the maximum
17
daily rainfall with the max function, and use it to set a common limit with
the optional ylim argument to plot.
Also, the optional extend to the window function, allows the time series
to be extended by the start and end arguments.
yrs <- c(1982, 1983, 1988, 1989,1991,1998,1999, 2006); ymax <- 0
for (i in yrs) {
ymax <- as.numeric(max(ymax, window(tana.ppt,
start=i, end=i+1, extend=TRUE),
na.rm=T))
}
(ymax <- ceiling(ymax))
## [1] 91
18
par(mfrow=c(4,2))
for (i in yrs) {
plot(window(tana.ppt, start=i, end=i+1, extend=TRUE),
type="h", ylab="mm", ylim=c(0,ymax));
title(main=paste("Lake Tana rainfall", i),
sub=paste("Annual total:",
sum(as.numeric(window(tana.ppt, start=i, end=i+1)),
na.rm=T)))
abline(h=ymax, col="gray")
points(xy.coords(x=time(window(tana.ppt, start=i, end=i+1, extend=TRUE)),
y=ymax, recycle=T),
pch=ifelse(is.na(window(tana.ppt, start=i, end=i+1, extend=TRUE)),"l",""),
col="red")
grid()
}
par(mfrow=c(1,1))
l l l
80
80
60
60
mm
mm
40
40
20
20
0
1982.0 1982.2 1982.4 1982.6 1982.8 1983.0 1983.0 1983.2 1983.4 1983.6 1983.8 1984.0
Time Time
Annual total: 894.6 Annual total: 1257.3
ll ll
80
80
60
60
mm
mm
40
40
20
20
0
1988.0 1988.2 1988.4 1988.6 1988.8 1989.0 1989.0 1989.2 1989.4 1989.6 1989.8 1990.0
Time Time
Annual total: 1314.1 Annual total: 1595.3
llllllllllllllllllllllllllllllllllllllllllllllllllllllllll l
80
80
60
60
mm
mm
40
40
20
20
0
1991.0 1991.2 1991.4 1991.6 1991.8 1992.0 1998.0 1998.2 1998.4 1998.6 1998.8 1999.0
Time Time
Annual total: 1429.3 Annual total: 1422.6
l l
80
80
60
60
mm
mm
40
40
19
20
20
0
1999.0 1999.2 1999.4 1999.6 1999.8 2000.0 2006.0 2006.2 2006.4 2006.6 2006.8 2007.0
Time Time
Annual total: 1474.4 Annual total: 1683.2
Q19 : Does there seem to be a seasonal difference in rainfall? Is it
consistent year-to-year? Jump to A19 •
This series has missing values; for some analyses complete series are
needed. The na.contiguous function finds the longest contiguous se-
quence of values:
str(tana.ppt)
sum(is.na(tana.ppt))
## [1] 127
frequency(tana.ppt.c)
## [1] 365
head(time(tana.ppt.c))
head(cycle(tana.ppt.c))
## [1] 9 10 11 12 13 14
tail(time(tana.ppt.c))
tail(cycle(tana.ppt.c))
sum(is.na(tana.ppt.c))
## [1] 0
60
40
20
0
Time
continuous record
Q20 : What is the extent of the contiguous time series? Jump to A20 •
20
2.3 Answers
Return to Q1 •
A2 : No, it is just a list of numbers. We have to know from the metadata what
it represents. Return to Q2 •
A4 : print shows the actual value of each measurements; here the depth to
groundwater. cycle shows the position of each measurement in its cycle; here
this is the month of the year. time gives the fractional year. Return to Q4 •
A6 :
1. Generally increasing trend over time since about 1983 and lasting till
2002; that is, depth is increasing so the groundwater is further from the
surface;
2. But, significant recharge 1975 – 1978;
3. Clear yearly cycle, but this is not seen in 1975–1978;
4. Severe draw-down in 1988, with rapid recharge.
21
Return to Q6 •
A8 : The series gets shorter; the differences given are for the end date of
each lag, so that, for example, the one-year lag can only begin in January 1991
(difference with January 1990). Return to Q8 •
A11 : The first differences are the year-to-year differences in the same month.
For example, from January 1990 to January 1991 the groundwater depth in-
creased by
, whereas the difference in the next year (January 1991 to January 1992) was
The second differences are the change in difference from one year to the next;
for example the difference from January 1991 to January 1992, compared to
the difference in the previous year (January 1990 to January 1991) is -0.51
. In this case the second year’s differences were less than the first, i.e., the
groundwater depth didn’t increase as much in the second year (January to
January), compared to the first.
The third differences are the change in two-year differences. Return to Q11 •
A12 :
Return to Q12 •
A13 : No. Not only is the magnitude of extraction and recharge more, the
22
recharge does not begin until March 1988, whereas in the other years it begins
in August. Note the small recharge in September 1988, which seems to be the
beginning of the recharge (fall and winter rains); but then this is interrupted
by six months of extraction during the (wet) winter. Return to Q13 •
A14 : To explain this there would have to have been continued extraction in
the winter of 1987–1988, perhaps because of failure of winter rains; we do not
have the rainfall data to see if this is possible. But also March 1988 would have
to had extreme rainfall.
Another possibility is that the records for winter 1987–1988 were incorrectly
entered into the data file, or perhaps measured in a different way, with this
being corrected in March 1988. However, they are consistent month-to-month
in this period, so it is unlikely to be a data entry error. Return to Q14 •
A15 :
Return to Q15 •
A16 : YEAR is the year of measurement; however it is only entered for the first
day of each year (upper-left of the sheet).
DATE is the day of the month (all months), from 1 .. 31
JAN . . . DEC are the rainfall amounts for the given day of the named month.
Return to Q16 •
A17 : "" (empty string) means there is no data at all, i.e., no day in the month
(here, 29 – 31 February); 0 means no rainfall, as does 0.0, and "TR" means
trace rainfall. Return to Q17 •
A18 : Only one decimal place is given; also the minimum measurement for
February was 0.1, so the precision is 0.1. January does have a 0.01 value,
which seems to be an inconsistent attempt to record the trace amount. Return
to Q18 •
A19 : There is a definite seasonality: rains from about May to about August
23
and the other months dry. Rains can begin early (1993) or there can be some
sporadic rains before the true start of the rainy season (1992, less in 1990).
Return to Q19 •
A20 : From the 9th day of 1999 through the end of 2006, i.e., almost eight
years. Return to Q20 •
3.1 Summaries
Task 18 : Summarize the groundwater levels of the first well, for the
whole series. •
The summary function gives the overall distribution:
summary(gw)
Task 19 : Summarize the groundwater levels of the first well, for each
year separately. •
The time-series data structure (one long vector with attributes) is not
suitable for grouping by year or position in cycle. We create a data frame,
one column being the time series and the other two factors giving the
year and cycle. Recall the time function returns the time of observation
(here, as fractional years); these are converted to year number with the
floor function. Similarly, the cycle function returns the position in
the cycle, from 1 ...frequency(series). Both of these are converted
from time-series to numbers with the as.numeric function. We also
keep the fractional time axis, as field time.
gw.f <- data.frame(gw, year=as.numeric(floor(time(gw))),
cycle=as.numeric(cycle(gw)), time=time(gw))
str(gw.f)
24
Now the year can be used as a grouping factor; the summary function
(specified with the FUN argument) is applied to each year with the by
function, with the IND “index” argument being the grouping factor:
head(by(gw.f$gw, IND=gw.f$year, FUN=summary))
## $`1975`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34.36 34.77 35.22 35.18 35.62 35.86
##
## $`1976`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 33.73 34.34 34.76 34.72 35.20 35.51
##
## $`1977`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30.61 31.47 31.61 31.81 32.16 33.40
##
## $`1978`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.90 30.07 30.34 30.41 30.75 30.97
##
## $`1979`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30.42 30.53 31.15 31.50 32.45 32.96
##
## $`1980`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 31.60 31.91 32.28 32.37 32.74 33.33
Note that the function applied could be max (to get the deepest level),
min (the shallowest), median, quantile etc., in fact anything that sum-
marizes a numeric vector.
A single year can be extracted with the [[]] list extraction operator:
by(gw.f$gw, IND=gw.f$year, FUN=summary)[["1978"]]
## [1] 30.97
## [1] 29.9
## [1] 30.839
25
groundwater level) on the right-hand side (here, year):
Anatolia well 1
55
50
groundwater level (m)
45
40
35
30
1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003
## Time Series:
## Start = 1975
## End = 2004
## Frequency = 1
## [1] 35.17750 34.71833 31.80750 30.41083 31.50333 32.36917 31.28667
## [8] 31.89667 34.50333 36.51417 38.27750 40.09167 42.08583 42.10417
## [15] 40.78417 41.66000 42.56833 42.46167 43.51583 45.49667 47.74917
## [22] 46.78500 44.99417 45.89000 47.85417 49.62000 52.94750 54.59083
## [29] 53.53667 54.21333
time(ann.mean)
## Time Series:
## Start = 1975
## End = 2004
26
## Frequency = 1
## [1] 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987
## [14] 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
## [27] 2001 2002 2003 2004
We subtract the correct annual mean from each observation, using the
match function to find the position in the vector of annual means that
corresponds to the year of the observation:
gw.f$in.yr <- as.numeric(gw - ann.mean[match(gw.f$year, time(ann.mean))])
str(gw.f)
●
4
●
●
●
●
Deviation from annual mean (m)
2
0
●
●
−2
●
−4
1 2 3 4 5 6 7 8 9 10 11 12
Month
Q22 : Is there an annual cycle? Are all the months equally variable? Are
there exceptions to the pattern? Jump to A22 •
3.2 Smoothing
27
3.2.1 Aggregation
Task 22 : Convert the monthly time series of well levels from 1980
through 1985 into a quarterly series of mean well levels for the quarter.
•
The aggregation function here must could be mean; the default sum has
no physical meaning. Other reasonable choices, depending on objective,
would be max, min, or median.
gw.q <- aggregate(window(gw, 1980, 1986), nfrequency=4, FUN=mean)
str(gw.q)
## Time-Series [1:24] from 1980 to 1986: 32.2 31.7 33 32.6 31.7 ...
par(mfrow=c(1,2))
plot(gw.q, ylab="Depth to water table (m)",
main="Anatolia Well 1, quarterly", type="b")
plot(window(gw, 1980, 1986), ylab="Depth to water table (m)",
main="Anatolia Well 1, monthly", type="b")
par(mfrow=c(1,1))
● ●●
40
●
● ●
●
●
● ●
38
38
●
● ●●
● ●
●
Depth to water table (m)
●
● ●
● ●
● ●
● ●●
36
●
36
● ● ● ●
● ●
●
●
● ●
●●●
34
34
●
● ●
● ● ●●
● ●● ●●
●●●●
● ● ●
●
● ●
●● ●
32
● ● ●
32
● ●
● ● ●
● ● ● ●●
●
● ● ● ●
● ● ● ●
●● ●●
● ● ●
● ●
1980 1981 1982 1983 1984 1985 1980 1981 1982 1983 1984 1985 1986
Time Time
Q23 : What are the differences between the monthly and quarterly se-
ries? How much resolution is lost? Jump to A23
•
28
3.2.2 Smoothing by filtering
A simple way to smooth the series is to apply a linear filter, using the
filter function. By default this uses a moving average of values on
both sides (before and after) each measurement. The method requires a
filter, which is a vector of coefficients to provide the weights, in reverse
time order.
The moving average for a given measurement is a weighted sum of 2p+1
measurements (the preceding p, the 1 original measurement, and the
following p measurements); 2p + 1 is called the order of the filter. Note
that filtering shortens the time series, because it can not be computed at
the p first and last measurements:
p
X
st = wj yt+j ; t = p + 1 . . . n − p (1)
j=−p
The weights wj must sum to 1; generally w−j = w+j (symmetry) but this
is not required.
29
Anatolia Well 1, 7−month filter
55
50
groundwater depth (m)
45
40
35
30
Time
Task 24 : Repeat the seven-month filter with equal weights for each
month; also compute an annual filter (12 months equally weighted); plot
the three filtered series together for the period 1990–1995. •
fgw.2 <- filter(gw, sides=2, rep(1,7)/7)
fgw.3 <- filter(gw, sides=2, rep(1,12)/12)
plot.ts(gw, xlim=c(1990,1995), ylim=c(37,48),
ylab="groundwater depth (m)")
title(main="Anatolia Well 1, three filters")
lines(fgw, col="blue", lty=2)
lines(fgw.2, col="red")
lines(fgw.3, col="magenta")
text(1995,40,"1/4,1/2,1 filter", col="blue", pos=2)
text(1995,39,"1,1,1 filter", col="red",pos=2)
text(1995,38,"annual filter", col="magenta",pos=2)
30
Anatolia Well 1, three filters
48
46
groundwater depth (m)
44
42
40
1/4,1/2,1 filter
1,1,1 filter
38
annual filter
Time
31
Anatolia Well 1, annual filter
55
50
groundwater depth (m)
45
40
35
30
Time
Another way to visualize the trend is to use the lowess “Local Polyno-
mial Regression Fitting” method [6], which fits the data points locally,
using nearby (in time) points. These are weighted by their distance (in
time) from the point to be smoothed; the degree of smoothing is con-
trolled by the size of the neighbourhood. This results in a smooth curve.
Task 26 : Display the time series and its smoothed series for the default
smoothing parameter (2/3), and three other values of the parameter, one
smoother, one finer, and one very fine (little smoothing). •
plot.ts(gw, main="Anatolia well 1 with smoothers",
ylab="groundwater depth (m)")
lines(lowess(gw), col="blue")
lines(lowess(gw, f=1), col="green")
lines(lowess(gw, f=1/3), col="red")
lines(lowess(gw, f=1/10), col="purple")
text(1990, 36, "Smoothing parameter: 2/3 (default)", col="blue", pos=4)
text(1990, 34, "Smoothing parameter: 1", col="green", pos=4)
text(1990, 32, "Smoothing parameter: 1/3", col="red", pos=4)
text(1990, 30, "Smoothing parameter: 1/10", col="purple", pos=4)
32
Anatolia well 1 with smoothers
55
50
groundwater depth (m)
45
40
Smoothing parameter: 1
Smoothing parameter: 1/3
30
Time
3.3 Decomposition
33
The workhorse function for decomposition is stl “Seasonal Decomposi-
tion of Time Series by Loess”, i.e., using a similar smooth trend removal
as the lowess function used above in §3.2.3. This has one required argu-
ment, s.window, which is the (odd) number of lags for the loess window
for seasonal extraction; for series that are already defined to be cyclic
(as here), this can be specified as s.window="periodic", in which case
the cycle is known from the attributes of the time series, extracted here
with the frequency function:
frequency(gw)
## [1] 12
## 1 2 3 4 5 6 7
## 41.01700 40.59800 40.25900 39.82733 40.38933 41.22433 42.07233
## 8 9 10 11 12
## 43.03767 43.30633 42.93767 42.33233 41.96433
This is subtracted from each value, leaving just the non-seasonal compo-
nent. Here we show two years’ adjustments numerically, and the whole
series’ adjustment graphically:
head(gw, 2*frequency(gw))
## [1] 34.36 34.45 34.70 34.80 34.88 35.16 35.60 35.86 35.86 35.70 35.48
## [12] 35.28 35.22 35.18 34.98 35.20 35.51 35.32 34.45 34.54 34.39 34.18
## [23] 33.92 33.73
## 1 2 3 4 5 6 7
## -6.657000 -6.148000 -5.559000 -5.027333 -5.509333 -6.064333 -6.472333
## 8 9 10 11 12 1 2
## -7.177667 -7.446333 -7.237667 -6.852333 -6.684333 -5.797000 -5.418000
## 3 4 5 6 7 8 9
## -5.279000 -4.627333 -4.879333 -5.904333 -7.622333 -8.497667 -8.916333
## 10 11 12
## -8.757667 -8.412333 -8.234333
par(mfrow=c(2,1))
plot(gw, ylab="depth to groundwater", main="Original series")
plot(gw-rep(tapply(gw, cycle(gw), mean), length(gw)/frequency(gw)),
ylab="difference from cycle mean", main="Seasonally-corrected series")
abline(h=0, lty=2)
par(mfrow=c(2,1))
34
Original series
55
depth to groundwater
50
45
40
35
30
Time
Seasonally−corrected series
15
10
difference from cycle mean
5
0
−5
−10
Time
Q28 : What has changed in the numeric and graphical view of the time
series, after adjustment for cycle means? Jump to A28 •
## List of 8
## $ time.series: Time-Series [1:360, 1:3] from 1975 to 2005: -0.271 -0.75 -1.149 -1.635 -1.
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:3] "seasonal" "trend" "remainder"
## $ weights : num [1:360] 1 1 1 1 1 1 1 1 1 1 ...
## $ call : language stl(x = gw, s.window = "periodic")
## $ win : Named num [1:3] 3601 19 13
## ..- attr(*, "names")= chr [1:3] "s" "t" "l"
35
## $ deg : Named int [1:3] 0 1 1
## ..- attr(*, "names")= chr [1:3] "s" "t" "l"
## $ jump : Named num [1:3] 361 2 2
## ..- attr(*, "names")= chr [1:3] "s" "t" "l"
## $ inner : int 2
## $ outer : int 0
## - attr(*, "class")= chr "stl"
## [1] 0
rm(tmp)
1.5
seasonal
0.5
−0.5
−1.5
30 35 40 45 50 55
trend
0 1 2 3 4
remainder
−2
time
36
The components of the decomposed series can be extracted; for example
to see just the trend:
plot(gw.stl$time.series[,"trend"],
main="Anatolia well 1, trend",
ylab="Groundwater level (m)")
50
45
40
35
30
Time
Another way to see the decomposition is with the ts.plot function; this
shows several time series (here, the components) on the same scale of
a single graph, thus visualizing the relative contribution of each compo-
nent:
ts.plot(gw.stl$time.series, col=c("black","blue","red"),
main="Anatolia well 1, decomposition",
ylab="Groundwater level (m)")
tmp <- attributes(gw.stl$time.series)$dimnames[[2]]
for (i in 1:3) {
text(1995, 24-(i*4), tmp[i], col=c("black","blue","red")[i], pos=4)
}
grid()
seasonal
trend
remainder
0
Time
37
Task 29 : Decompose the groundwater time series, with a two-year
window for the seasonal component. •
gw.stl <- stl(gw, s.window=2*frequency(gw)+1)
plot(gw.stl)
50
data
40
30
2
seasonal
1
0
−1
−2
30 35 40 45 50 55
trend
0 1 2 3 4
remainder
time
Q31 : How does this decomposition differ from the pure periodic de-
composition? Jump to A31
•
The smoothness of the lowess fit is controlled with the t.window argu-
ment; by default this is:
nextodd(ceiling((1.5*period) / (1-(1.5/s.window))))
so that for a 25-month seasonal window on a 12-month cycle of this
example the trend window is the next higher odd number of d(1.5 ∗
12)/(1 − (1.5/25))e = 20, i.e., 21.
Note: This is for the case when the period is explicitly given. If the
window is specified as s.window="periodic" the smoothness parameter
is one more than 1.5 times the cycle length, see 3.3.
38
For a smoother trend this should be increased, for a finer trend de-
creased. The smoothness of the trend depends on the analyst’s knowl-
edge of the process. The previous decomposition has a very rough trend,
let’s see how it looks with a smoother trend.
2
seasonal
1
0
−1
−2
35 40 45 50 55
trend
6
remainder
4
2
0
−2
time
Q32 : What is the difference between this smooth and the default de-
compositions? Which best represents the underlying process? Jump to
A32 •
39
autocorrelation and is an example of the second-order summary8
The first way to visualize this is to produce scatterplots of measure-
ments compared to others with various lags, i.e., time periods before
the given measurement. The lag.plot function produces this.
Anatolia well 1
30 40 50 60 30 40 50 60
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ●
● ● ●
55
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ●● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ●● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
50
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
●●● ●
●●● ● ●●● ●
● ●● ● ● ● ● ● ● ● ●
●
● ● ●
●● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
●
● ● ● ● ●● ● ●● ● ●
● ● ●
● ● ●
● ● ●● ● ● ● ● ●● ●
●● ● ●● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
●● ● ● ● ● ● ●● ● ● ● ● ● ●
● ●● ●
45
● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ●● ●●
●
●● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ●
●
● ● ● ● ● ●
gw
gw
gw
● ● ● ● ● ● ● ● ●
● ● ●
● ●● ●
● ● ● ● ●● ● ●
● ● ● ● ● ●
● ●● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ●● ● ●●
● ●● ●
● ● ● ●
●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ●●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
●● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●● ● ●● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ●
●● ● ● ●● ● ●
●●
● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ● ● ●
●
● ● ●●● ● ●● ● ● ● ●
●● ●
● ● ●
40
● ●● ●● ● ● ●
●● ● ● ● ● ●
● ●●● ● ●● ● ● ● ● ●● ●
● ●
● ●● ● ● ●● ● ● ● ●
● ● ●● ● ● ●● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●● ● ●● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
●● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
●● ● ● ● ●
●
● ● ● ●
● ●
35
● ●
●
● ● ● ● ● ● ● ●
● ● ● ●
● ● ●● ● ● ●
●● ● ● ● ● ●
● ●●
●● ●● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ●
● ●● ● ● ●
●● ● ●
●
●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
●● ● ● ●● ● ● ● ● ●●
● ● ●
● ● ●● ● ●
●● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
●● ● ● ● ●
● ● ●
●● ●● ● ● ● ●
●● ● ●
●● ●
● ●●
●● ● ●●● ● ● ● ●
● ●
●● ● ● ● ● ●
●
● ●● ● ● ● ● ● ●
● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●
●● ●● ●●
● ● ● ● ● ●
● ● ● ● ● ●
●● ●● ● ●
●●● ●● ● ● ●● ● ●
● ● ●●●
●
●
●● ●● ● ●●
● ●●
● ● ●
●●●● ● ●● ● ●
30
● ● ● ● ● ●
●● ● ● ● ●● ● ●● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ●
●● ● ● ● ●●
●● ●● ● ●
● ● ●
● ● ●
● ● ●
●
● ● ●
55
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
●● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●●
● ● ●● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
50
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
●● ● ● ●● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ●● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ●● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
45
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
●● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
gw
gw
gw
● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
●● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●● ● ● ●
● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●● ● ● ● ●●
● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●● ●
●
●● ● ● ●●● ●
●● ●● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
●● ● ● ●● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ●
● ●
● ● ● ● ● ● ● ● ●
● ● ●
40
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●●● ●
● ● ● ● ● ● ● ● ●
●● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●● ● ●
● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ●●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ●● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
●● ●● ● ●
● ● ●● ● ● ●
35
● ●
● ●● ● ● ● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●● ●●
● ● ●● ● ●
● ● ● ●
●
● ● ● ●● ●● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●● ●● ●● ● ● ●
●
● ● ● ● ● ●
● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●
● ● ●
● ● ●
● ● ●● ●●
●● ● ● ● ● ●● ●● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●● ● ●
● ●
●● ●● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●
● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●●
● ● ● ● ●
● ● ● ● ● ●
● ●● ● ● ● ●● ● ● ● ●
● ● ●
● ● ●
●● ● ● ● ● ● ●● ●● ●
● ●●● ● ● ●
30
● ●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ●● ●●
●● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
●● ● ●
55
● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ●●
● ● ●
● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
50
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ● ●● ●
● ●● ●●
● ●●
● ● ● ● ● ●
●●● ●
● ●● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
45
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
gw
gw
gw
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●● ●●
● ● ● ●● ● ●● ●
● ● ●
● ●● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ●●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●● ●
● ● ●
● ●
●
● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
●● ● ● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ● ● ●
●
● ● ●
40
●● ●● ●●
● ● ● ● ● ● ● ● ●
● ●●
● ● ● ●
●● ● ● ● ● ●
● ● ● ● ● ● ● ● ●● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
●● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ●●
● ● ● ● ● ●
35
● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ●● ● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●● ●
●
● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
●● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ●
● ●●● ●● ●●● ● ● ●
●
● ● ●
● ● ●
● ● ● ● ●●
● ● ● ● ● ●
● ● ●●● ●
● ● ●●● ● ● ●● ● ● ● ● ● ●
● ● ●●● ● ● ● ●●● ●
● ● ● ●● ● ● ● ● ●
30
● ● ● ● ● ●● ● ● ●●
● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●
● ●
● ● ●
●● ●● ●
●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
55
● ● ●
● ● ● ● ● ●
● ● ● ● ●●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
50
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ●● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ●● ●
●● ●
● ● ●
● ● ●
● ●● ● ● ●● ● ●●●
● ●● ●●
●● ●● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ●● ● ● ● ● ● ●
● ●●●
● ● ●
45
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
gw
gw
gw
●● ● ● ● ● ●● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ●● ● ● ● ● ●●
● ● ● ● ● ● ● ● ●● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ●●●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ●
● ● ●● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ●● ● ●
● ● ● ● ● ●
●● ●● ●●
● ● ● ● ● ● ●
● ● ● ● ●● ●●● ● ● ●
● ● ●●
● ● ●
40
● ● ● ●● ● ●
● ● ● ● ● ●● ●
● ● ● ●● ●
● ● ●
● ● ● ● ● ●● ● ● ● ●●
● ● ●
● ●
●
● ● ● ● ● ● ● ● ●
● ●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●●
● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ●● ● ● ●
● ● ● ● ●
● ● ● ●● ● ● ● ●
● ● ●
● ● ●● ●●
● ● ● ● ● ●
35
● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ●
● ●● ● ● ● ● ●● ● ● ● ● ● ●
● ●
● ●● ● ● ●● ● ● ●
● ● ● ● ● ● ● ●
● ● ●
●● ●
● ● ●●
● ● ●● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ●● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ●● ●● ●
●
● ● ●
● ● ●
● ●
●●● ● ● ●● ●
● ● ●● ●●
●● ●●● ● ●
● ● ● ● ● ●
●● ●● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
●● ● ● ● ● ●● ●
● ● ● ● ● ● ●
● ●
●● ● ● ●●●●
●
●● ● ● ● ●●
● ●
● ●● ●
30
● ● ● ● ● ● ●
●
●● ●●● ● ● ● ●● ● ● ● ●
● ● ●
● ● ● ●
● ● ●
●
● ● ●
● ● ● ●●
Note that positive values of the lags argument refer to lags before an
8
The first-order summary is the expected value, either constant mean or trend.
40
observation. This is used also in the lag function, which produces a
lagged series with the same indices as the original series, i.e., the series
is not shifted.
window(gw, 2000, 2001)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 2000 48.07 47.75 47.26 46.84 47.73 48.24 50.98 51.71 52.38 52.39
## 2001 50.11
## Nov Dec
## 2000 51.42 50.67
## 2001
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1999
## 2000 47.75 47.26 46.84 47.73 48.24 50.98 51.71 52.38 52.39 51.42
## Nov Dec
## 1999 48.07
## 2000 50.67 50.11
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1999
## 2000 47.26 46.84 47.73 48.24 50.98 51.71 52.38 52.39 51.42 50.67
## Nov Dec
## 1999 48.07 47.75
## 2000 50.11
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 2000 48.07 47.75 47.26 46.84 47.73 48.24 50.98 51.71 52.38
## 2001 50.67 50.11
## Nov Dec
## 2000 52.39 51.42
## 2001
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 2000 48.07 47.75 47.26 46.84 47.73 48.24 50.98 51.71
## 2001 51.42 50.67 50.11
## Nov Dec
## 2000 52.38 52.39
## 2001
By default if the time series is longer than 150 (as here), individual mea-
surements are not labelled nor joined by lines. For shorter series they
are.
41
Anatolia well 1
38 40 42 44 46 38 40 42 44 46
45
21 20 21 20 21 20
32 32 32
44
19 19 19
9 9 9
34 34 34
10 31 10 31 10
23 7 23 7 23 7 31
43
11 11 11
25 36 24 6 25 24 6 24
25 6
30 18 30 18 30 18
42
35 35
12 12 12
29 29 29
26
13 26
13 13 26
17 17 17
41
14 28 14 28 14 28
15 27 15 27 27
15
16 16 16
5 5 5
40
2 1 2 1 1 2
3 3 3
39
4 4 4
45
20
21 21 20 21 20
32 32
window(gw, start = 1990, end = 1993)
44
19 19 19
9 9 9
10 31 10 31 10 31
23 7 23 7 7 23
43
11 11 11
24 25 6 24 25 6 6 24 25
30 18 30 18 30
18
42
12 12 12
29 29 29
13 26 13 26 13 26
17 17 17
41
14 27 28 14 28 27 28 27 14
15 15 15
16 16 16
5 5 5
40
1 2 1 2 1 2
3 3 3
39
4 4 4
20
21 20 21 20 21
window(gw, start = 1990, end = 1993)
19 19 19
9 9 9
10 10 10
7 23 7 23 7 23
43
11 11 11
6 24 25 6 25 24 6 25 24
30 18 18 18
42
12 12 12
29 29
26 13 26 13 26 13
17 17 17
41
28 27 14 27 28 14 28 27 14
15 15 15
16 16 16
5 5 5
40
2 1 2 1 2 1
3 3 3
39
4 4 4
45
20 21 20 21 21 20
window(gw, start = 1990, end = 1993)
44
19 19 19
9 9 9
10 10 10
7 23 7 23 23 7
43
11 11 11
6 25 24 6 24 25 6 24
25
18 18 18
42
12 12 12
26 13 26 13 13
17 17 17
41
14
27 15 14 14
15 15
16 16 16
5 5 5
40
2 1 2 1 2 1
3 3 3
39
4 4 4
##
## Autocorrelations of series 'gw', by lag
##
## 0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500
## 1.000 0.988 0.970 0.950 0.930 0.914 0.903 0.898 0.898 0.901
## 0.8333 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833
## 0.906 0.907 0.903 0.891 0.874 0.854 0.834 0.818 0.807 0.800
## 1.6667 1.7500 1.8333 1.9167 2.0000 2.0833
## 0.799 0.800 0.803 0.803 0.797 0.784
42
Autocorrelation, groundwater levels, Anatolia well 1
1.0
0.8
0.6
ACF
0.4
0.2
0.0
Lag
43
Autocorrelation, groundwater levels, Anatolia well 1
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0 1 2 3 4 5
Lag
2
0
−2
Time
44
print(acf(gw.r, plot=F))
##
## Autocorrelations of series 'gw.r', by lag
##
## 0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500
## 1.000 0.887 0.769 0.659 0.559 0.468 0.385 0.324 0.270 0.231
## 0.8333 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833
## 0.193 0.147 0.098 0.056 0.001 -0.058 -0.111 -0.169 -0.222 -0.271
## 1.6667 1.7500 1.8333 1.9167 2.0000 2.0833
## -0.314 -0.345 -0.357 -0.370 -0.398 -0.397
0.2
0.0
−0.2
−0.4
Lag
The blue dashed lines show correlations that are not provably different
from zero.
45
The partial and ordinary autocorrelations for lag 1 are the same.
Partial autocorrelations are computed as follows:
Pk φk = ρk (2)
ρ1 ρ2 ρk−1
1
···
ρ 1 ρ1 ··· ρk−2
1
Pk = (3)
. . . ··· .
ρk−1 ρk−2 ρk−3 ··· 1
ρj = φk,1 ρj−1 + · · · + φk,k ρj−k , j = 1, 2, . . . k (4)
where φk,j is coefficient j of an autoregressive process of order k.
##
## Partial autocorrelations of series 'gw', by lag
##
## 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333
## 0.988 -0.242 -0.047 0.033 0.151 0.127 0.159 0.123 0.128 0.014
## 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 1.6667
## -0.070 -0.124 -0.179 -0.086 -0.053 0.019 0.061 0.029 0.021 0.064
## 1.7500 1.8333 1.9167 2.0000 2.0833
## 0.010 0.038 -0.049 -0.081 -0.085
0.4
0.2
0.0
−0.2
Lag
46
As with the graph produced by acf, the blue dashed lines show correla-
tions that are not provably different from zero.
Q37 : What are the partial autocorrelations that are different from zero?
Jump to A37 •
##
## Partial autocorrelations of series 'gw.r', by lag
##
## 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333
## 0.887 -0.082 -0.028 -0.022 -0.025 -0.021 0.042 -0.015 0.031 -0.032
## 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 1.6667
## -0.064 -0.043 -0.004 -0.103 -0.062 -0.031 -0.091 -0.046 -0.058 -0.057
## 1.7500 1.8333 1.9167 2.0000 2.0833
## -0.016 0.014 -0.067 -0.121 0.084
0.4
0.2
0.0
Lag
Q38 : What are the partial autocorrelations that are different from zero?
How does this differ from the partial autocorrelations of the original
series? What is the interpretation? Jump to A38 •
47
3.6 Spectral analysis
1 +1/2 2π iωt
Z
γt = e f (2π ωf )dωf
2π −1/2
where ωf is the frequency expressed as cycles per unit of time.
This can be inverted to give the density at each frequency:
" ∞
X
#
f (ω) = γ0 1 + 2 ρt cos(ωt)
i
2
1
X
−iωt
I(ω) = e X t
n t
48
The raw spectrum is too noisy to interpret, so it is usually smoothed with
so-called Daniell windows, which give half-weight to end values. The
window width is specified with the spans optional argument; optional
values are trial-and-error, until the main features of the periodogram are
revealed.
## List of 16
## $ freq : num [1:180] 0.0333 0.0667 0.1 0.1333 0.1667 ...
## $ spec : num [1:180] 7.5 7.28 6.92 6.43 5.62 ...
## $ coh : NULL
## $ phase : NULL
## $ kernel :List of 2
## ..$ coef: num [1:6] 0.1667 0.1562 0.125 0.0833 0.0417 ...
## ..$ m : int 5
## ..- attr(*, "name")= chr "mDaniell(2,3)"
## ..- attr(*, "class")= chr "tskernel"
## $ df : num 14.3
## $ bandwidth: num 0.0726
## $ n.used : int 360
## $ orig.n : int 360
## $ series : chr "x"
## $ snames : NULL
## $ method : chr "Smoothed Periodogram"
## $ taper : num 0.1
## $ pad : num 0
## $ detrend : logi TRUE
## $ demean : logi FALSE
## - attr(*, "class")= chr "spec"
head(s$spec, n=12)
49
Series: x
Smoothed Periodogram
5e+00
5e−01
spectrum
5e−02
5e−03
0 1 2 3 4 5 6
frequency
bandwidth = 0.0726
Note that the spectrum is displayed on logarithmic scale in the plot. The
blue vertical line (upper right corner) gives the 95% confidence interval
that the reported density is different from zero.
The x-axis of the spectrum shows the period, i.e., the inverse of the de-
clared frequency. For example “1” is a full period (one cycle), “2” is half a
cycle, “3” is one-third of a cycle, etc.; these are the harmonics. So in this
example with ω = 12, the spectral density at x = 1 is for one cycle (12
months, one year) and the density at x = 2 is for a half-cycle (6 months,
half-year).
The resolution of the decomposition is determined by the length of the
time series (here, 360 observations); the resulting spectral decomposi-
tion is half this length (here, 360/2 = 180), and this is estimated by the
number of total cycles in the series (here, 360/12 = 30 annual cycles),
so the finest period is ω/2, here 12/2 = 6, each period divided into total
cycles:
frequency(gw)
## [1] 12
length(gw)/frequency(gw)
## [1] 30
head(s$freq,n=length(gw)/frequency(gw))
50
Q39 : At what frequencies are the largest periodic components of the
Fourier decomposition? What is the interpretation? Jump to A39 •
We find the largest components by sorting the spectrum, using the sort
function with the optional index.return argument to save the positions
of each sorted element in the original vector, and grouping nearby peaks.
ss <- sort(s$spec, decreasing=T, index.return=T)
str(ss)
## List of 2
## $ x : num [1:180] 7.5 7.28 6.92 6.43 5.62 ...
## $ ix: int [1:180] 1 2 3 4 5 6 7 30 29 31 ...
hi <- ss$x>.15
which(hi)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
## [23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
ss$x[hi]
ss$ix[hi]
## [1] 1 2 3 4 5 6 7 30 29 31 8 28 32 9 27 33 10 26 11 34 12 25
## [23] 13 24 35 23 22 14 21 15 20 19 16 18 17 60 61 59 36
sort(s$freq[ss$ix[hi]])
The list of indices is in units of inverse frequency; we can see three peaks:
near zero (corresponding to no cycles, i.e., the mean), near one (annual),
and centred on two (6-month); this harmonic is not provably different
from zero.
The log scale display for the spectral density is the default in order
to more easily show the lower-power components. The raw density is
shown by specifying the optional log="no" argument; specifying log="dB"
argument shows the spectrum as decibels9 as is conventional in signal
processing.
51
par(mfrow=c(1,3))
spectrum(gw, spans=c(5,7), log="no", main="Anatolia groundwater level",
sub="annual cycle, smoothed")
grid()
spectrum(gw, spans=c(5,7), log="yes", main="Anatolia groundwater level",
sub="annual cycle, smoothed")
grid()
spectrum(gw, spans=c(5,7), log="dB", main="Anatolia groundwater level",
sub="annual cycle, smoothed")
grid()
par(mfrow=c(1,1))
10
5e+00
5
6
0
5e−01
spectrum (dB)
spectrum
spectrum
−5
4
−10
5e−02
−15
2
−20
5e−03
0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
52
Periodogram, residuals
0
−5
spectrum (dB)
−10
−15
−20
−25
0 1 2 3 4 5 6
frequency
annual cycle, smoothed
Notice that there is no peak corresponding to one year; this has been
removed with the annual cycle. There are also no peaks at the harmonics
(1/2, 1/3 etc. years).
Zooming in on the first two frequencies, i.e., one and one-half year, with
two different views:
par(mfrow=c(1,2))
spectrum(gw.r, span=c(5,7), log="no", main="Periodogram, residuals",
sub="annual cycle, smoothed", xlim=c(0,2), type="h")
grid()
s <- spectrum(gw.r, span=c(5,7), log="dB", main="Periodogram, residuals",
sub="annual cycle, smoothed", xlim=c(0,2))
grid()
sp.gw$freq[which.max(s$spec)]
## [1] 0.2333333
frequency(gw)/(s$freq[which.max(s$spec)])
## [1] 51.42857
which.max(s$spec[16:30])
## [1] 8
sp.gw$freq[which.max(s$spec[16:30])+15]
## [1] 0.7666667
frequency(gw)/(s$freq[which.max(s$spec[16:30])+15])
## [1] 15.65217
par(mfrow=c(1,1))
53
Periodogram, residuals Periodogram, residuals
0
1.5
−5
spectrum (dB)
spectrum
1.0
−10
−15
0.5
−20
0.0
−25
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
frequency frequency
annual cycle, smoothed annual cycle, smoothed
Q40 : What are the periods of the highest spectral densities? How can
these be explained physically? Jump to A40 •
3.7 Answers
A21 : Until 1981 the amplitude is small, less than 2 m. It then increases but
there are year-to-year differences. In 2002 the amplitude was highest. Return
to Q21 •
A22 : There is a definite annual cycle: September has the deepest levels (at
the end of the extractive period) and April the shallowest (at the end of the
winter rains). Winter months are a bit more variable than summer. Obvious
outliers from the overall pattern are the deep levels in one December – March
period. Return to Q22 •
A23 : The monthly series has thrice the points and thus more detail; however
the pattern remains clear and the high/low values for each cycle are not much
different, so the lower temporal resolution does not much affect interpretation.
Return to Q23 •
A24 : The peaks and valleys are less extreme, and some month-to-month
irregularities are removed. Return to Q24 •
A25 : The peaks and valleys are further smoothed, month-to-month irregular-
ities are removed. Return to Q25
•
A26 : The annual cycle is removed, this shows the year-to-year variation.
54
Return to Q26 •
A27 : The default parameter (2/3) shows the overall trend (increasing ground-
water) depth) slightly adjusted for local phenonomena; increasing the param-
eter to 1 removes almost all variation and results in almost a straight line;
decreasing to 1/3 adjusts more closely to trends that go over a few years, for
example the initial overall decrease in depth for the first three years, and the
unusual extraction in 1988. The paramater value 1/10 adjusts very closely to
each year and obscures the overall trend. The parameter value of 1/3 seems
most useful here. Return to Q27 •
A28 : Numerically, the mean is zero and the numbers represent deviations
from the cycle at each position in it. Thus at the earlier years the groundwater
is shallower (negative values), later it is deeper (positive values). For most of
the series the seasonal fluctuations have been mostly removed, but prior to
1980 they are amplified. This is because in that period there was not much
seasonal fluctuation, so averaging the cycle over all years amplifies these early
small fluctuations. Return to Q28 •
A29 : The decomposed series has class stl and consists of three named time
series: “seasonal”, “trend”, and “remainder”, organized as a matrix of class mts
(“multiple time series”) with three columns (one for each series). Return to
Q29 •
A30 : The average seasonal component has amplitude ≈ ±1.5m depth; the
trend ranges over 25m depth and generally increases, after an initial decrease;
the remainder ranges from ≈ −2 . . . 4.5m, but all except 1988 are within a
more restricted range, ≈ ±1.5m. The remainders show strong auto-correlation
within each cycle. Thus the trend is by far the largest componet; the seasonal
and remainer are similar orders of magnitude. Return to Q30 •
A31 : The seasonal component can change with time; here the amplitude
increases until about 2000 and then stabilizes. The cycles themselves may
have different shapes: note the “shoulder” in the decreasing level in 1983-
1990, which is absent from later yeard. The amplitude of the remainder has
been reduced, because the cycles are more closely fit. Return to Q31 •
A32 : The smoother decomposition has a smoother trend (of course); the
seasonal component is the same, so the remainders are larger and their serial
auto-correlation extends over a longer span. The smooth trend seems to repre-
sent a long-term change in groundwater, probably due to increasing extraction
for agriculture. The rough trend has noise that does not seem to be due to a
long-term process. Return to Q32 •
A33 : The auto-correlations gets weaker for the first six or seven months, but
55
then strengthens; this reflects the annual cycles. Return to Q33 •
A34 : The detailed scatterplot shows which measurements are more or less
correlated to the lagged measurement; also the lines show an evolution of this
over time: i.e., whether the correlation is systematically greater or less. Return
to Q34 •
A36 : The remainders are positively autocorrelated within one year (up to 11
months); they are then not different from zero (no correlation) until the 16th
month, after which the autocorrelation is increasingly negative.
The removal of the trend has taken away most of the autocorrelation due to
the continuous nature of groundwater level change. Removal of the cycle has
taken away any autocorrelation due to seasonality (i.e., extraction and recharge
at the same seasons each year), which is reflected in the lack of correlation in
the year to year remainders (lag 12).
The remainder represents the local effects after accounting for trend and cycle.
There is positive autocorrelation within one year, because in a wet (or dry) year
the level can not fluctuate rapidly and so tends to stay relatively high (or low).
The negative autocorrelation between subsequent years means that relatively
wet (dry) years tend to be followed by relatively dry (wet) remainders, i.e., after
accounting for trend and cycle. Return to Q36 •
A38 : The only partial autocorrelation provably different from zero is the first
lag (one month). Once this is accounted for, the other lags have no autocorre-
lation. The remainders have only a one-month autocorrelation, and thus could
be modelling by a first-order autoregressive process (§4.4.1). Return to Q38 •
A39 : At frequency 1 (one year) there is a large component (-9 dB); i.e., a
strong annual cycle. There is also a much weaker component (-16.51 dB) at the
six-month frequency. Return to Q39 •
A40 : The highest spectral density is at 7/30 = 0.23̄ cycles per year, i.e.,
51.4 months per cycle, or about 4 years 3 months. A much smaller peak is
at 23/30 = 0.76̄ cycles per year, i.e., 15.7 months, or about 1 year 3 months.
These seem to be artefacts of the particular data series. Return to Q40 •
56
4 Modelling a time series
We have already seen (§3.3) that a time series can be decomposed into
a trend, cycle, and residual using the stl function. The residual is by
definition noise in this decomposition, the other two components are a
model, representing the long-term and cyclic behaviour of the series.
The problem with this approach is that the trend removal is empirical:
its smoothness must be adjusted by the analyst. Recall the two decom-
positions of the groundwater data, with different smoothness:
plot(stl(gw, s.window="periodic")$time.series,
main="Well 1, periodic decomposition")
57
Well 1, periodic decomposition
1.5
seasonal
0.5
−0.5
55 −1.5
50
trend
45
40
35
4 30
3
remainder
2
1
0
−2
Time
58
Anatolia well 1, s=25, t=85 decomposition
2
1
seasonal
0
−1
55 −2
50
trend
45
40
35
6
remainder
4
2
0
−2
Time
59
(prediction).
##
## Call:
## lm(formula = gw ~ time, data = gw.f)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.9134 -1.8445 -0.3193 1.4857 6.6562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.576e+03 3.053e+01 -51.64 <2e-16 ***
## time 8.130e-01 1.534e-02 53.00 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.521 on 358 degrees of freedom
## Multiple R-squared: 0.887,Adjusted R-squared: 0.8867
## F-statistic: 2809 on 1 and 358 DF, p-value: < 2.2e-16
Task 41 : Plot the time series, with the fits from the linear model
superimposed. •
plot(gw.f$time, gw.f$gw, type="l", col="darkgreen",
ylab="Groundwater level (m)", xlab="Year")
title("Anatolia well 1, linear trend")
lines(x=as.numeric(gw.f$time), y=fitted(m.gw), col="red")
60
Anatolia well 1, linear trend
55
50
Groundwater level (m)
45
40
35
30
Year
Note that the adjusted R 2 and slope can be extracted from the model
object:
summary(m.gw)$adj.r.squared
## [1] 0.8866509
coefficients(m.gw)["time"]
## time
## 0.8130209
61
Residuals vs Fitted
8
159 ●
● 158
● 157
6
●
●
●● ● ● ●
●● ●
●
●●
●●
● ●
●● ● ●
●
●
● ●
● ● ●
4
●
● ●
● ●
● ●● ●
● ● ●
● ● ● ●
● ●
●● ● ●
●
● ● ● ● ●
● ● ●
● ●
●● ●● ● ● ● ● ●
●● ● ● ● ●●
2
Residuals
● ● ● ●●
● ●
● ● ● ●
● ●● ● ●● ● ●
● ●● ●● ●
● ●
●● ● ● ● ●● ● ● ●
● ● ●●● ●●
● ● ● ● ● ●
● ● ● ●
●● ● ●● ● ● ●
●
●
● ●● ● ●● ● ● ● ● ● ● ●●
● ● ●
0
● ● ● ●● ● ● ●
●
●
● ●●● ● ●●●●
● ●● ● ● ● ● ●
●●
● ●
● ● ●
●● ●● ● ● ●●● ● ●● ● ●
● ● ●● ● ● ● ●● ●● ●
● ● ●
● ●●
● ● ● ● ●● ● ●● ● ● ●
● ● ●●●
● ● ● ●● ● ● ● ●
−2 ●●
● ●
● ●● ●
● ● ●●● ● ●● ● ●
●
●●● ● ●● ● ● ●● ●●
●
●●● ● ●● ● ●
● ● ● ●●
●
●
●● ●● ● ● ●●
● ●
●● ● ●●
● ● ●
●● ● ● ●
● ●●
● ● ● ●● ●● ●● ●
●●● ● ●●
● ●●
−4
●● ●
●
●●● ● ●
● ● ●●
●
●
−6
30 35 40 45 50
Fitted values
lm(gw ~ time)
The diagnostic plot shows that this model violates one assumption of
linear modelling: there be pattern to the residuals with respect to fitted
values; most of the problem is at low fitted values, i.e., the beginning
of the time series. There is also a large discrepency near 40 m, which
corresponds to the anomalous year 1988.
If residuals are correlated in time (as in this case), the OLS regression
is not optimal. Instead, the trend should be fit by Generalized Least
Squares (GLS).
In OLS the residuals ε are assumed to be independently and identically
distributed with the same variance σ 2 :
y = Xβ + ε, ε ∼ N (0, σ 2 I) (5)
y = Xβ + η, η ∼ N (0, V) (6)
62
where V the variance-covariance matrix of the residuals V = σ 2 C, where
σ 2 is the variance of the residuals and C is the correlation matrix.
The computations are performed with the gls function of the nlme ‘Non-
linear mixed effects models’ package [2].
Task 43 : Set up and solve a GLS model, using the covariance structure
estimated from the variogram of the OLS residuals. •
The linear model formulation is the same as for lm. However:
• It has an additional argument correlation, which specifies the
correlation structure.
• This is built with various correlation models; we use corAR1 for
AR(1) temporal correlation. This requires two arguments:
– value the value of the lag 1 autocorrelation, which must be
between -1 and 1;
– form a one-sided formula specifying the time covariate, if any.
In this case there is no covariate, so only an intercept is speci-
fied as ~1.
The value can be changed during optimization and will be reported
in the results.
We obtain the initial value from the acf function; the second value is the
one-lag autocorrelation.
Note: For a list of the predefined model forms see ?corClasses. Users
can also define their own corStruct classes.
library(nlme)
(cor.value <- acf(gw, plot=FALSE)$acf[2])
## [1] 0.9878291
63
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.9019852 -0.7551349 -0.2290400 0.4591564 2.1482906
##
## Residual standard error: 2.911948
## Degrees of freedom: 360 total; 358 residual
coef(m.gw)
## (Intercept) time
## -1576.2972795 0.8130209
coef(m.gw.gls)
## (Intercept) time
## -1500.4355894 0.7750657
Task 44 : Plot this GLS trend on the time series, with the OLS trend for
comparison. •
plot(gw.f$time, gw.f$gw, type="l", col="darkgreen",
ylab="Groundwater level (m)", xlab="Year")
title("Anatolia well 1, linear trend")
abline(m.gw.gls, col="darkgreen")
lines(x=as.numeric(gw.f$time), y=fitted(m.gw), col="red")
legend("topleft", c("GLS","OLS"), lty=1, col=c("darkgreen","red"))
GLS
OLS
55
50
Groundwater level (m)
45
40
35
30
Year
Q44 : How has the slope of the trend changed from the OLS to GLS
64
model? Jump to A44 •
Higher-order trend The trend seems to have two inflection points (around
1980 and 1985), so perhaps fitting a cubic trend might give a better
model. The anova function compares two models.
m.gw.3 <- lm(gw ~ I(time^3) + time, data=gw.f)
summary.lm(m.gw.3)
##
## Call:
## lm(formula = gw ~ I(time^3) + time, data = gw.f)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7362 -1.9403 -0.1648 1.6461 7.4775
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.864e+04 4.985e+03 5.746 1.96e-08 ***
## I(time^3) 1.917e-06 3.163e-07 6.062 3.42e-09 ***
## time -2.197e+01 3.758e+00 -5.845 1.14e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.403 on 357 degrees of freedom
## Multiple R-squared: 0.8975,Adjusted R-squared: 0.8969
## F-statistic: 1563 on 2 and 357 DF, p-value: < 2.2e-16
anova(m.gw.3, m.gw)
65
Anatolia well 1, cubic trend
55
50
Groundwater level (m)
45
40
35
30
Year
Q45 : Is the cubic trend model better than the linear trend? Jump to
A45 •
Splitting the series Clearly the series to about 1978 is different from
that after; perhaps the extraction did not begin until then?
Task 45 : Model the trend since 1978 with both an OLS and a GLS
model. •
The subset function is used to limit the time series in the dataframe. We
have to re-compute the starting values for the autocorrelation parameter
of the GLS model from just this part of the series.
gw.f.78 <- subset(gw.f, gw.f$year > 1977)
summary(m.gw.78 <- lm(gw ~ time, data=gw.f.78))
##
## Call:
## lm(formula = gw ~ time, data = gw.f.78)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0422 -1.4399 -0.0702 1.3550 7.3310
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1763.3608 30.4744 -57.86 <2e-16 ***
## time 0.9068 0.0153 59.26 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.147 on 322 degrees of freedom
## Multiple R-squared: 0.916,Adjusted R-squared: 0.9157
## F-statistic: 3511 on 1 and 322 DF, p-value: < 2.2e-16
66
(cor.value <- acf(gw.f.78, plot=FALSE)$acf[2])
## [1] 0.9841868
GLS
55
OLS
Groundwater level (m)
50
45
40
35
30
Year
For this portion of the time series the GLS and OLS models are almost
identical:
coef(m.gw.78)
## (Intercept) time
67
## -1763.3607555 0.9067699
coef(m.gw.78.gls)
## (Intercept) time
## -1764.379848 0.907287
Task 46 : Compare the GLS model since 1978 with the GLS model for
the whole series. •
coefficients(m.gw.gls)["time"]
## time
## 0.7750657
coefficients(m.gw.78.gls)["time"]
## time
## 0.907287
Q46 : Is the average annual change different for the model fit on the
entire series versus the model fit on the post-1977 section? Which would
you use in extrapolating into the future? Jump to A46 •
Task 48 : Predict the groundwater level from 2005 to 2055; graph this
with its 95% confidence interval of prediction. •
Again we use the predict.lm method, this time with a sequence of times
at which to predict.
gw.2050 <- predict.lm(m.gw.78, data.frame(time=2005:2050),
interval="prediction", level=0.95)
str(gw.2050)
68
grid()
abline(v=2005, lty=2)
lines(x=as.numeric(gw.f$time[gw.f$year > 1977]),
y=fitted(m.gw.78), col="blue")
lines(2005:2050, gw.2050[,"fit"])
lines(2005:2050, gw.2050[,"upr"], col="red", lty=2)
lines(2005:2050, gw.2050[,"lwr"], col="red", lty=2)
text(1990,60,"fit", col="blue")
text(2030,60,"extrapolation")
100
90
80
Groundwater level (m)
70
60
fit extrapolation
50
40
30
Year
69
## 'data.frame': 360 obs. of 6 variables:
## $ gw : Time-Series from 1975 to 2005: 34.4 34.5 34.7 34.8 34.9 ...
## $ year : num 1975 1975 1975 1975 1975 ...
## $ cycle : num 1 2 3 4 5 6 7 8 9 10 ...
## $ time : Time-Series from 1975 to 2005: 1975 1975 1975 1975 1975 ...
## $ in.yr : num -0.818 -0.727 -0.477 -0.378 -0.297 ...
## $ nonseas: Time-Series from 1975 to 2005: 34.5 34.8 35.3 35.8 35.6 ...
##
## Call:
## lm(formula = nonseas ~ time, data = subset(gw.f, gw.f$year >
## 1977))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9693 -0.9538 -0.1106 0.8762 8.2621
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.757e+03 2.558e+01 -68.70 <2e-16 ***
## time 9.036e-01 1.284e-02 70.36 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.802 on 322 degrees of freedom
## Multiple R-squared: 0.9389,Adjusted R-squared: 0.9387
## F-statistic: 4950 on 1 and 322 DF, p-value: < 2.2e-16
45
40
time series
non−seasonal component
35
linear trend
30
Year
70
Task 50 : Compare this linear model (with the seasonal component
removed) to the linear model of the series since 1978 computed in the
previous subsection. Consider (1) the amount of variability explained;
(2) the slope of the trend. •
(tmp <- summary(m.gw.nonseas)$adj.r.squared -
summary(m.gw.78)$adj.r.squared)
## [1] 0.02299186
tmp/summary(m.gw.78)$adj.r.squared
## [1] 0.02510743
coefficients(m.gw.nonseas)["time"]
## time
## 0.9035588
coefficients(m.gw.78)["time"]
## time
## 0.9067699
Q49 : By how much does the slope of the trend change when the sea-
sonal component is removed before modelling the trend? Jump to A49
•
A non-parametric test is one that does not assume any underlying distri-
bution. In the trend analysis of the previous section (§4.3.1) we assumed
that the residuals (after accounting for seasonality and trend) was inde-
pendently and identically normally-distributed (IIND); this is a require-
ment for using ordinary least squares (OLS) to fit a linear model.
Task 51 : Check that the residuals of the trend analysis are IID. •
There are various formal tests, but we will visualize the regression di-
agnostics with the plot function applied to linear model results, using
the which argument to select graphs 1 (residuals vs. fitted values) and 2
(normal quantile-quantile plot of residuals).
par(mfrow=c(1,2))
plot(m.gw.nonseas, which=1:2)
par(mfrow=c(1,1))
71
Residuals vs Fitted Normal Q−Q
159 ● 159 ●
8
● 158 ● 158
4
● 157 ● 157
● ●
Standardized residuals
6
● ●
●
4
●
Residuals ●● ●
2
● ● ●
● ●
●●
● ● ● ● ●●●●
●
●
●●
● ●
● ●
●
●
●● ●●
●●
●●
● ● ●●●● ●● ●
●●
●
●●
●
●
●
●
●●
2
●
●● ●
●
● ● ● ●
● ●
●
●●
●
●
●
●●● ● ●● ● ● ●
●
●
●●
●
● ●
● ● ● ●● ●● ●
●●
●
●
● ● ● ●●
●
●●●●
●
● ●● ● ●●●● ●
●●
● ● ●● ● ●●●
●●
●
●
●
●
●
●●
● ●
●
●
● ●
●
●
●
●
● ●●
●●
●●
●
● ●
●
●● ●
●
●
●
●● ● ●● ●
● ●
● ●
●●
●●●
●
●
●● ●
●●
●
●
●
●
●
●
●● ●●
● ●●●●●●● ● ● ●
●
●● ● ●●
0
● ●
●
●
●●●● ● ● ● ●● ●●
●● ●● ● ●
●●
●
●●
●
●
●●
●●
●●●●●●
●
●
● ● ●
● ● ●
● ●●● ●●
● ● ●
● ● ●
●●
●
●●
●
●●
●● ● ●●● ●● ● ●
●●
0
●
●●
● ●
●
● ●
● ● ●●● ●● ●●
● ● ●
●●
●
●
●●
●● ● ●
● ●● ●
● ● ● ●
●●
●
●●
●
● ●●
●
●● ● ●●●
●●
●
●
● ●
●
●
●●
●
●●
●
●●
●
●●
● ● ●●
● ● ●● ●●
●
●●
●
●
●
−2
●
●
●●
●
● ● ●
● ●
●●
●●● ●
● ●●
●
●
●
●
●
●●
●● ● ●
●●●
● ●
●
● ●
●
●
●
●
●●
●●
●● ●●
●●
●●● ●
●
●●
●
●
●● ●
●●
●
●
●
●● ●
●
●●
●
●
●●
●
●
−4
● ●●●●
●●●
●●●
●
●
●●●●
−2
●●
●
30 35 40 45 50 55 −3 −2 −1 0 1 2 3
Since the residuals do not meet the criteria for OLS, the confidence inter-
vals computed for the slope may not be accurate; further the apparent
trend may not be real. In this case the trend is quite strong so this is
not an issue. Still, we discuss how to detect trend in a series where OLS
models are not justified.
One approach is to use a robust regression [3] to estimate the trend its
confidence limits.
Another approach is to use a non-parametric test. Hirsch et al. [11]
present the Mann-Kendall test for trend, which has been included in the
Kendall package written by Ian McLeod (co-author of [10]).
Task 52 : Check for a trend in the groundwater levels time series since
1978. •
The MannKendall function of the Kendall package computes this test
for a time series. The SeasonalMannKendall function does the same
under the alternative hypothesisis that for one or more months the sub-
sample is not distributed identically. In the present case the series is
clearly seasonal, so we can either compute the seasonal test for the se-
ries or the non-seasonal test for the decomposed series (i.e., we remove
the seasonality).These should give similar results.
require(Kendall)
72
The τ-value is a test statistic that is then compared to a theoretical value,
resulting in the probability that a τ-value this large could have occurred
by chance in a series with no true trend.
Q51 : What is the probability that there is no trend in this series? Jump
to A51 •
Note the use of na.omit to account for the possibility of missing values
in the time series, and hence missing differences; this will be used in the
following example (§4.3.3).
coefficients(m.gw.nonseas)["time"]
## time
## 0.9035588
73
individual slope estimates
2500
2000
1500
Frequency
1000
500
0
−5 0 5
slope
The trend is not always so obvious, and the deviations from IIND resid-
uals much stronger, in some time series. For example, the Kendall
package includes the sample dataset GuelphP, a monthly time series of
phosphorous (P) concentrations in mg l-1, Speed River, Guelph, Ontario,
January 1972 through January 1977.
74
Speed River water quality
1.0
●
0.8
P concentration, mg l−1
●
0.6
●
●
●
● ●
0.4
●
●
● ●
●
●
● ●
●
●
●
0.2
● ● ●
● ●
● ● ●
● ● ●● ●
● ●●● ● ● ●● ●● ●
● ● ●●
● ●●
● ●●● ● ● ●● ●● ●
●● ● ●
●
0.0
Time
Q52 : Describe this time series qualitatively (in words). Does there seem
to be a linear trend? Jump to A52 •
75
individual slope estimates
100
80
Frequency
60
40
20
0
slope
Q53 : Is there a trend? If so, what is its slope? Is this slope meaningful?
Jump to A53 •
The simplest model form is the autoregressive (AR) model. Here the
values in the series are correlated to some number of immediately pre-
ceding values. The strength of the correlation, relative to the white noise,
gives the continuity of the series.
The AR process is defined such that each Yt in the sequence is computed
from some set of previous values Ys , s < t, plus white noise Zt . This
white noise is independently and identicall distributed (IID) at each time
step.
This model implies an underlying process with no trend, where the the
value at one time point is partly retained at the next in an AR(1) process;
this can be considered inertia in the process. The autocorrelation is not
perfect, this allows white noise, i.e., completely random processes, to
alter the next value(s).
To simplify computations, the series is centred by subtracting the over-
76
all mean µ and considering the differences:
p
X
(Yt − µ) = αl (Yt−l − µ) + Zt (8)
l=1
This has the same form as a linear regression, and the single parameter
α1 can be computed in the same way.
This series is sometimes called a red noise process, because the white
noise represented by the sequence {Zt } has the high-frequency varia-
tions (“blue”, by analogy with the light spectrum) smoothed out by the
autocorrelation. The low-frequency (“red”) random variations are pre-
served.
In §3.5 we saw that the remainders for the groundwater levels of well 1
had no partial autocorrelations, after the first order was accounted for.
This indicates an AR(1) model.
Note that the mean of the remainders should be zero, if we’ve accounted
for the trend and cycles; in this case it is very close:
mean(gw.r)
## [1] -0.06420781
77
Anatolia well 1 remainders, lag 1 scatterplot
6
●
4
●
●
●
●
● ●●
gw.r
● ●●
● ●
● ● ●●
2
● ● ● ●●●
●
●●● ● ●
● ● ● ● ●● ●
●●
● ●●
● ●●●●
● ● ●
● ●●●●●●●
●
● ●
●
● ●
● ●●●● ●
● ●●●●●●
●● ● ●●
● ● ●
●
●●
●
●●
●●●●
●
●
● ●●●
●●
●
●●●
●
●●
● ●
●● ●
●● ●
●
●●●
●
●●
●●● ● ●
● ●
●●
●●
●●●●● ●●
●
0
● ●● ●●●
●● ●●
●
●●●●
●● ●●●
● ●●●
●●●●●●●
●●
●
●
●●●
●●
●
● ●● ●
● ● ●● ●
● ●●●● ●●
●
●
●
●
●●●
●
●
●
●
●●
●
●●
●
● ●●●
●● ●
●
● ●● ●●
●
●●●●●●●
● ●● ●●
● ● ●
● ●●
● ● ●●
● ● ●
●●
●
●●●
●●
●● ●●●
● ●●
● ●
● ●●●●●●
●
●●
●●● ● ●
● ●
●●●● ●
●●●
●
● ●●●
● ●●
●
●●●
−2
●● ●
●●●● ●
●
−2 0 2 4 6
lag 1
Clearly a linear relation between the series of remainders and its first
lagged series is justified. We compute this relation first with the stan-
dard lm method; note we must relate an observation to the preceding
observation (time flows in one direction!). We must first produce a time-
series offset by one month. We saw in §3.4 that the lag function lags
the time series but does not shift it; here we need a shift in order to
relate the previous month’s level to the current month’s level. The shift
is effected by subscripting, using the [] operator.
We first construct the two series, subtracting in each case the mean:
gw.r.0 <- gw.r[2:(length(gw.r)-1)] - mean(gw.r)
gw.r.1 <- lag(gw.r,1)[1:(length(gw.r)-2)] - mean(gw.r)
##
## Call:
## lm(formula = gw.r.0 ~ gw.r.1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9367 -0.2370 -0.0304 0.2109 1.5253
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.001855 0.029559 0.063 0.95
## gw.r.1 0.887310 0.024360 36.424 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5593 on 356 degrees of freedom
## Multiple R-squared: 0.7884,Adjusted R-squared: 0.7878
## F-statistic: 1327 on 1 and 356 DF, p-value: < 2.2e-16
abline(m.lag1)
78
(alpha.1 <- cor(gw.r.0,gw.r.1))
## [1] 0.8879416
6
●
4
●
●
●
●
Original
●
● ●
● ● ●
● ● ●
●
● ● ●●
●
2
● ●●
●● ●● ●
●
●
●●● ● ● ●
● ●● ●
●● ● ●●●●●●●
● ●● ●● ●
● ● ● ●
● ●● ●● ●●●●●●● ●●
●● ●● ●●
●
●
●
● ●●● ●●●●●
●
●●
● ●●● ●
● ●●
● ●●●
●●
●
●
●●● ●
● ● ●● ●
●●
●●● ●●
●●●● ●●
●●●●●●●
● ●
●●●
●
●
●● ●●
0
● ●● ●●● ● ●
●●
●● ●
●●
●●●●●●●
● ● ●● ●
●●
●
●
●●●
● ●●● ●
●●●
●
●
●
●●
●
●
●
●●●
● ●●
●● ●●●
●
●
● ●
●
●
●●●
●●
●●
● ● ●
● ● ●● ● ●●● ●●
●●●●
● ●●●●●● ●
● ●●●●●● ●●●● ● ●●
●●
●●●●
●
●
● ●●● ●●●
●●●●
●
●●● ● ●
●
●●
●
● ●
●
●●
●● ●
●●●● ●
● ● ●● ● ●
−2
●
●
●
−2 0 2 4 6
Lag 1
Note that the comparable figure for the uncorrected series is much higher,
because of the inherent continuity within the annual cycle:
cor(gw[2:(length(gw)-1)] - mean(gw),
lag(gw,1)[1:(length(gw)-2)] - mean(gw))
## [1] 0.9934396
## [1] 0.8868833
Note: The slight difference between this estimate 0.8869 and the esti-
mate directly from linear correlation 0.8879 may be due to how the two
computations deal with the ends of the series.
Finally, the innovation variance is the variance of the white noise of Eq.
9. This is computed as [20, §8.3.1]:
where σY2 is the variance of the time series. That is, the noise is reduced
from the observed noise in the series by the autocorrelation – that much
79
of the noise is accounted for by the model. This illustrates the “red shift”
mentioned above. For sampled time series of length n, the series vari-
ance σY2 is estimated from the sample variance sY2 , the true correlation α
is estimated as α̂ and the noise must be corrected for bias:
n−1
sZ2 = (1 − α̂2 )sY2 (11)
n−2
## [1] 1.469715
## [1] 0.3118009
We will return to this example and simulate an AR(1) series with this
these fitted parameters in §8.
## , , 1
##
## [,1]
## [1,] 0.98782910
## [2,] -0.24150277
## [3,] -0.04679133
## [4,] 0.03321400
## [5,] 0.15111996
## , , 1
##
## [,1]
## [1,] 0.88688330
## [2,] -0.08180894
## [3,] -0.02809817
## [4,] -0.02235775
## [5,] -0.02513076
80
Series gw Series gw.r
1.0
0.8
0.8
0.6
0.6
Partial ACF
Partial ACF
0.4
0.4
0.2
0.2
0.0
0.0
−0.2
Lag Lag
Once the preceding lag is taken into account (high positive correlation,
high continuity) we see that the second lag is negatively correlated (lack
of continuity). Even for the remainders, this is the case but not quite at
the level of significance.
r1 = αˆ1 + αˆ2 r1
r2 = αˆ1 r2 + αˆ2
n−1
(1 − r12 )sY2
sZ2 (2) = (1 − α̂22 ) (12)
n−2
where r1 is the sample correlation at lag 1. Thus the original white noise
is reduced yet further, as the degree of the AR model increases.
The ar function not only solves these equations, but also solves them
for all orders from AR(1), AR(2) . . . until the higher-order fit is not better,
as judged by the AIC.
81
(ar.gw.r <- ar(gw.r, method="yule-walker"))
##
## Call:
## ar(x = gw.r, method = "yule-walker")
##
## Coefficients:
## 1 2
## 0.9594 -0.0818
##
## Order selected 2 sigma^2 estimated as 0.3133
ar.gw.r$var.pred
## [1] 0.3133392
ar.gw.r$ar
For comparison, we re-fit the AR(1) series also, using the order.max
argument to force ar to only fit this order.
(ar.gw.r.1 <- ar(gw.r, order.max=1))
##
## Call:
## ar(x = gw.r, order.max = 1)
##
## Coefficients:
## 1
## 0.8869
##
## Order selected 1 sigma^2 estimated as 0.3146
ar.gw.r.1$ar
## [1] 0.8868833
We will return to this example and simulate an AR(2) series with these
fitted parameters in §8.
82
The βj weight the relative contributions of the previous values. The time
series results from random noise, which (if any βj 6= 0) can “drift” into
an apparent trend, which in fact is the result of the stochastic process,
not a true trend. Thus, MA models are often used to model apparent
trends.
The “I” in ARIMA stands for “integrated”. These are ARMA models with
an additional element: the degree of differencing applied to the series
before ARMA analysis. ARMIMA models are conventionally specified
with three components (p, d, q), which are:
1. p: the AR order;
2. d: the degree of differencing; and
3. q: the MA order.
Differencing is applied so that the series is second-order stationary, i.e.,
the expected value and covariance do not depend on the position in the
series.
ARIMA models are fit with the arima function. This requires at least
two arguments: the series and the order. To illustrate, we re-fit the
AR(2) model of the well level residuals (§4.4.1) with arima. The order is
(2, 0, 0):
(arima.gw.r <- arima(gw.r, order=c(2,0,0)))
##
## Call:
## arima(x = gw.r, order = c(2, 0, 0))
83
##
## Coefficients:
## ar1 ar2 intercept
## 0.9624 -0.0851 -0.0925
## s.e. 0.0524 0.0525 0.2341
##
## sigma^2 estimated as 0.3078: log likelihood = -299.48, aic = 606.95
Q57 : Does the ARIMA(2,0,0) fit give the same coefficients as the AR(2)
fit? Jump to A57 •
The coefficients for model fit by ar are in field ar; for a model fit by
arima in field coef:
ar.gw.r$ar
arima.gw.r$coef
p
X
(Yη,τ − µτ ) = αl,τ (Yη,τ−l − µτ ) + Zη,τ (14)
l=1
Note: Note that a PAR model can not have any trend, just the cyclic com-
ponent. If there is an overall trend it must be removed before modelling.
84
Task 59 : Fit a PAR model to the de-trended time series of groundwater
levels of Anatolia well 1 since 1978. •
Recall that the behaviour before 1978 was qualitatively different than
that since; we suspect that extraction began in 1978. In §4.2 a linear
trend was established for that period:
gw.f.78 <- subset(gw.f, gw.f$year > 1977)
coef(m.gw.78 <- lm(gw ~ time, data=gw.f.78))
## (Intercept) time
## -1763.3607555 0.9067699
We subtract the fits from this model from each observation (using the
fitted extractor function on the linear model), to get the de-trended
series. We also need to extract the time-series window, using window.
gw.1978 <- window(gw, c(1978,1), c(2004,12))
str(gw.1978)
## Time-Series [1:324] from 1978 to 2005: 30.4 30.3 30.1 29.9 29.9 ...
str(fitted(m.gw.78))
## Time-Series [1:324] from 1978 to 2005: 0.1199 -0.0457 -0.2813 -0.5568 -0.5924 ...
## - attr(*, "names")= chr [1:324] "37" "38" "39" "40" ...
Anatolia well 1
6
Deviations from linear trend, m
4
2
0
−2
−4
Time
We now fit a PAR model, with AR(2) for the non-seasonal part (as revealed
85
by our previous analysis) and different AR orders for the seasonal part.
A seasonal order of (0, 0, 0) corresponds to the same cycle each year.
Higher-order AR represent autocorrelation of cycles year-to-year; e.g.,
a high-amplitude cycle tends to be preceded and followed by a similar
amplitude
##
## Call:
## arima(x = gw.1978.0, order = c(2, 0, 0), seasonal = c(0, 0, 0))
##
## Coefficients:
## ar1 ar2 intercept
## 1.3306 -0.4585 -0.0046
## s.e. 0.0492 0.0492 0.3326
##
## sigma^2 estimated as 0.5983: log likelihood = -377.66, aic = 763.32
##
## Call:
## arima(x = gw.1978.0, order = c(2, 0, 0), seasonal = c(1, 0, 0))
##
## Coefficients:
## ar1 ar2 sar1 intercept
## 1.1142 -0.2404 0.4272 -0.0026
## s.e. 0.0639 0.0640 0.0593 0.5293
##
## sigma^2 estimated as 0.5159: log likelihood = -354.63, aic = 719.27
##
## Call:
## arima(x = gw.1978.0, order = c(2, 0, 0), seasonal = c(2, 0, 0))
##
## Coefficients:
## ar1 ar2 sar1 sar2 intercept
## 1.0657 -0.1691 0.3604 0.2144 -0.0118
## s.e. 0.0618 0.0635 0.0562 0.0571 0.8162
##
## sigma^2 estimated as 0.4923: log likelihood = -347.92, aic = 707.84
Q58 : How much does modelling the seasonal component improve the
fit? Which degree of autoregression among seasons is indicated? Jump
to A58 •
86
empirical model that can be used for this purpose. Interpretation in
terms of underlying processes is not straightforward.
Modelling with an ARIMA model has three stages:
1. Model identification;
2. Parameter estimation;
3. Diagnostic checking of model suitability.
These stages are iterated until the model is deemed “suitable”; then the
model is ready to use for process interpretation or forecasting.
Task 60 : Plot the groundwater time series and evaluate its stationarity;
also zoom in on a three-year window to see the fine structure. •
par(mfrow=c(1,2))
plot(gw, main="Anatolia well 1",
ylab="Groundwater level (m)")
plot(window(gw, 1990, 1994), main="Anatolia well 1",
ylab="Groundwater level (m)")
par(mfrow=c(1,2))
45
44
50
Groundwater level (m)
43
45
42
40
41
35
40
30
39
1975 1980 1985 1990 1995 2000 2005 1990 1991 1992 1993 1994
Time Time
87
Task 61 : Plot the first difference of the groundwater time series and
evaluate its stationarity; also zoom in on a three-year window to see the
fine structure. •
par(mfrow=c(1,2))
plot(diff(gw), main="Anatolia well 1",
ylab="Groundwater level (m), delta-1")
plot(diff(window(gw, 1990, 1994)), main="Anatolia well 1",
ylab="Groundwater level (m), delta-1")
par(mfrow=c(1,2))
2
Groundwater level (m), delta−1
1
−2
0
−4
−6
−1
1975 1980 1985 1990 1995 2000 2005 1990 1991 1992 1993 1994
Time Time
Task 62 : Plot the second difference of the groundwater time series and
evaluate its stationarity. •
par(mfrow=c(1,2))
plot(diff(diff(gw)), main="Anatolia well 1",
ylab="Groundwater level (m), delta-2")
plot(diff(diff(window(gw, 1990, 1994))), main="Anatolia well 1",
ylab="Groundwater level (m), delta-2")
par(mfrow=c(1,2))
88
Anatolia well 1 Anatolia well 1
2.0
1.5
5
Groundwater level (m), delta−2
1.0
0.5
0
0.0
−0.5
−1.0
−5
−1.5
1975 1980 1985 1990 1995 2000 2005 1991 1992 1993 1994
Time Time
2
Groundwater level (m), delta−3
1
5
0
−1
0
−2
−5
−3
1975 1980 1985 1990 1995 2000 2005 1991 1992 1993 1994
Time Time
There seems to be little change between the second and third differences.
Another way to look at the stationary is with the autocorrelation function
plotted by acf.
Task 64 : Plot the autocorrelation functions for the original time series
and the first three differences. •
par(mfrow=c(2,2))
acf(gw)
acf(diff(gw))
acf(diff(diff(gw)))
acf(diff(diff(diff(gw))))
par(mfrow=c(1,1))
89
Series gw Series diff(gw)
1.0
1.0
0.8
0.8
0.6
0.6
0.4
ACF
ACF
0.4
0.2
0.2
0.0
−0.2
0.0
−0.4
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
Lag Lag
1.0
0.8
0.6
0.5
0.4
ACF
ACF
0.2
0.0
0.0
−0.2
−0.5
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
Lag Lag
90
Series gw.1978
1.0
0.8
0.6
Partial ACF
0.4
0.2
0.0
−0.2 0.5 1.0 1.5 2.0
Lag
0.1
0.3
0.0
0.2
Partial ACF
Partial ACF
−0.1
0.1
0.0
−0.2
−0.1
−0.3
−0.2
Lag Lag
91
Series diff(diff(gw.1978))
1.0
0.8
0.6
0.4
ACF
0.2
0.0
−0.2 0.0 0.5 1.0 1.5 2.0
Lag
The second step is to estimate the model parameters, using the arima
function. This must be supplied with three parameters, which specify the
model type; these are conventionally known as p (the AR order), d (the
degree of differencing), and q (the MA order), as explained above. ARIMA
models may also declare a periodic (also called seasonal) component,
with the same parameters.
##
## Call:
## arima(x = gw.1978, order = c(13, 2, 0))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7
## -0.8569 -0.8617 -0.8759 -0.9057 -0.8682 -0.8971 -0.8651
## s.e. 0.0556 0.0719 0.0801 0.0834 0.0839 0.0820 0.0828
## ar8 ar9 ar10 ar11 ar12 ar13
## -0.9231 -0.8893 -0.7707 -0.5805 -0.2637 -0.0708
## s.e. 0.0816 0.0835 0.0830 0.0798 0.0715 0.0553
##
## sigma^2 estimated as 0.4767: log likelihood = -340.38, aic = 708.76
The third step is model checking; the tsdiag function produces three
diagnostic plots for ARIMA models:
1. standardized residuals (should show no pattern with time);
2. autocorrelation function (ACF) of residuals (should have no signifi-
cant autocorrelation);
3. the Ljung-Box statistic for the null hypothesis of independence in
the time series of residuals.
We can see the effect of a poor model fit by under-fitting the groundwater
levels with an AR model:
m.ar <- arima(gw.1978, c(3,2,0))
tsdiag(m.ar)
92
Standardized Residuals
6
4
2
0
−2
−4
−6
−8
1980 1985 1990 1995 2000 2005
Time
ACF of Residuals
1.0
0.8
0.6
ACF
0.4
0.2
0.0
−0.2
Lag
●
0.8
●
0.6
●
p value
0.4
0.2
0.0
● ● ● ● ● ● ●
2 4 6 8 10
lag
93
Standardized Residuals
2
0
−2
−4
−6
−8
−10
1980 1985 1990 1995 2000 2005
Time
ACF of Residuals
1.0
0.8
0.6
ACF
0.4
0.2
0.0
Lag
● ● ● ● ● ● ● ● ●
●
0.8
0.6
p value
0.4
0.2
0.0
2 4 6 8 10
lag
## List of 2
## $ pred: Time-Series [1:96] from 2005 to 2013: 54 53.3 52.9 53.1 53.8 ...
## $ se : Time-Series [1:96] from 2005 to 2013: 0.69 1.05 1.32 1.54 1.72 ...
94
lines(p.ar$pred+p.ar$se, col="red", lty=2)
lines(p.ar$pred-p.ar$se, col="red", lty=2)
grid()
Anatolia well 1
70
Predicted groundwater level, m
60
50
40
Time
Q62 : What happens to the prediction as the time forward from the
known series increases? What happens to the confidence intervals? As
a groundwater manager, how far ahead would you be willing to use this
predicted series? Jump to A62 •
4.7 Answers
A41 : (1) There is a long-term trend, to a slightly shallower level until 1980
and then steadily to a deeper level. The slope of the trend (groundwater level
vs. time interval) varies a bit over the 1980- 2005, which may reflect rainfall
differences. The trend is most likely due to increased extraction for irrigation,
since the quantity of far exceeds annual rainfall; which we see in . . . (2) There is
a seasonal cycle in groundwater level, due to recharge in the winter (rains) and
extraction in the summer; however the magnitude of the fluctuation appears
to increase from about 1983–1995 and has since stabilized. The increasing
fluctuation may be caused by more extraction but is also balanced by more
rainfall. (3) The remainder is of similar magnitude to the annual cycle (±2m)
and is strongly auto-correlated; the explanation for this is unclear. The very
large remainder in 1988 was discussed in §2.1. Return to Q41 •
95
increase is given by the slope coefficient, 0.813 Return to Q42 •
A44 : The slope is somewhat shallower: GLS 0.7751 vs. OLS 0.813. Return to
Q44 •
A45 : Yes, the proportion of variation explained has increased to 0.9; the
ANOVA shows a highly significant improvement in fit. The cubic adjusts some-
what to the initial part of the series. Return to Q45
•
A46 : The average annual increase has changed considerably, increasing from
0.775 for the whole series to 0.907 for the shorter series. The steeper slope
ignores the earliest part of the series, when the process was different, and so
seems to better reflect current conditions and is preferable for predicting in
the short term. Return to Q46 •
A47 : The trend can not continue forever – for example, at a certain point the
groundwater level will be too deep to pump economically and will stabilize at
that level. Also, future rainfall patterns and future demand for irrigation water
may change. Return to Q47 •
A48 : The variance explained increases 2.51% compared to the fit to the
original time series, because there is less overall variability. Return to Q48 •
A49 : The slope of the trend is almost identical; the fit without the seasonal
component predicts 3.21 mm less drawdown per year. Return to Q49 •
A52 : The series from 1972 – 1974 has large fluctuations and high concentra-
tions; after 1974 these are both much lower, except for some spikes in 1975.
There seems to be a seasonal component. Some observations are missing.
There is no linear trend, instead there seems to be a discontinuity in 1974,
96
from high to low concentrations. Return to Q52 •
A53 : The probability that the observed trend, estimated as -0.056 (mg l-1)
yr-1, is due to sampling variation is very small, about 1 in 107 . Return to Q53
•
A55 : After correcting for immediate continuity (since this is ground water,
to be expected) the lag-2 level or remainder introduces a negative correction.
That is, going two steps forward with a single AR(1) model would over-state
the continuity, compared to an AR(2) model. Return to Q55 •
A56 : An AR(2) series is selected; thus the lag-2 component can be modelled.
The residual variance decreases slightly, from 0.3146 to 0.3133. Most of the
variance reduction was from the original series to the AR(1) series; the AR(2)
series is only a small improvement. Return to Q56 •
A57 : No, there are slight differences. The AR(2) fit gives the two coefficients
as 0.9594 and -0.0818; for the ARIMA(2,0,0) fit these are computed as 0.9624
and -0.0851 Return to Q57 •
A58 : The standard error of the best seasonal ARIMA fit is 0.4923, compare
to 0.3133 for the non-seasonal model; this is somewhat higher, i.e., the fit is
worse. But, the seasonal model also includes the seasonal component, which
was removed from the time series fit with the non-seasonal model. Return to
Q58 •
A59 : There seems to be a clear trend to deeper levels since 1982; further the
expected values seem to follow an annual cycle. Both are indications that this
is not a first-order stationary series. The variance increases over time. So, this
is not second-order stationary. Return to Q59 •
A60 : There is no apparent trend, but there still seems to be an annual cycle.
So, this is not first-order stationary. The variance increases over time. So, this
is not second-order stationary. Return to Q60 •
A61 : The ACF of the original series does not decay to zero; this indicates a
trend. Once this is removed by the first difference, the ACF shows clear cyclic
behaviour. This is largely removed by the second difference and completely by
the third. Return to Q61 •
A62 : The prediction becomes more uniform the further out we predict, i.e.,
97
the cyclic behaviour is damped and approaches the overall linear trend. The
upper and lower confidence limits become wider and also are damped; but
they rapidly become much larger than the annual cycle. The prediction does
not seem very useful to the manager; it seem more logical to use the average
behaviour of the cycles in the known series and add it to the trend. Return to
Q62 •
5 Intervention analysis
Time series may arise completely or mostly from natural processes (e.g.,
rainfall, temperature) but may also be influenced by human activities.
These interventions may be one-time (e.g., damming a river) or pulsed
(e.g., streamflow with controlled releases from dams). Hipel and McLeod
[10, Ch. 19] is an excellent explanation of intervention analysis, which
has two main uses:
1. Determining the effect of a known intervention (e.g., has a new
sewage treatment plant improved water quality?);
2. Identifying probable unknown interventions (e.g., is there a new
source of pollutants?).
Each of these may affect the series in several ways, known as transfer
functions:
Task 67 : Load this dataset and plot as a time series, showing the
known intervention. •
require(Kendall)
data(GuelphP)
str(GuelphP)
98
## Time-Series [1:72] from 1972 to 1978: 0.47 0.51 0.35 0.19 0.33 NA 0.365 0.65 0.825 1 ...
## - attr(*, "title")= chr "Phosphorous Data,Speed River,Guelph,1972.1-1977.1"
1.0
● known intervention
●
P concentration, mg l−1
●
0.8
●
0.6
● ●
● ●
0.4
● ●
● ● ●
●
● ●
●
●
0.2
●
● ● ●
● ●
● ● ● ● ●●
● ●● ● ●● ●● ●● ●
● ● ● ● ●● ●
● ●●
● ●●● ● ● ●● ●● ●●
●
● ●
0.0
Time
Note the use of the abline function to add a line to the graph, specifying
the v argument to specify a vertical line at the indicated date.
How can we reliably quantify this difference? One obvious way is to split
the series and compare the means, medians, or variances.
Task 68 : Split the series at February 1974 and compare the means,
medians, or variances. •
Since these statistics do not involve time series, and there are missing
values, we convert to ordinary vectors with as.vector and remove the
NA’s with na.omit:
(guelph.1 <- na.omit(as.vector(window(GuelphP, start=NULL,
end=1974+1/12))))
## [1] 0.470 0.510 0.350 0.190 0.330 0.365 0.650 0.825 1.000 0.385 0.900
## [12] 0.295 0.140 0.220 0.200 0.140 0.400 0.495 1.100 0.590 0.270 0.300
## [23] 0.065
## attr(,"na.action")
## [1] 6 19 25
## attr(,"class")
## [1] "omit"
## [1] 0.240 0.058 0.079 0.065 0.120 0.091 0.058 0.120 0.120 0.110 0.460
## [12] 0.150 0.086 0.028 0.110 0.360 0.180 0.065 0.130 0.120 0.190 0.150
## [23] 0.107 0.047 0.055 0.080 0.071 0.121 0.108 0.169 0.066 0.079 0.104
## [34] 0.157 0.140 0.070 0.056 0.042 0.116 0.106 0.094 0.097 0.050 0.079
## [45] 0.114
## attr(,"na.action")
## [1] 15
99
## attr(,"class")
## [1] "omit"
mean(guelph.1); mean(guelph.2)
## [1] 0.4430435
## [1] 0.1159556
median(guelph.1); median(guelph.2)
## [1] 0.365
## [1] 0.106
sd(guelph.1); sd(guelph.2)
## [1] 0.2839124
## [1] 0.07789123
We would like to state that these are significant differences (not due
to chance) but we can’t use a t-test, because the observations are not
independent – they are clearly serially and seasonally correlated. So we
need to build a time-series model that includes the intervention.
5.2 Answers
A64 : The later series has a much lower mean, median, and especially standard
deviation (variability). Return to Q64 •
100
Anatolia well 2
20
Groundwater level (m)
15
10
5
Time
str(gw2)
## Time-Series [1:360, 1:2] from 1975 to 2005: 34.4 34.5 34.7 34.8 34.9 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:2] "gw" "gw.2"
Task 71 : Plot the two wells’ time series on the same graph. •
plot(gw2, plot.type="single", main="Anatolia wells 1 and 2",
ylab="Groundwater depth (m)")
lines(lowess(gw, f=1/3), col="red")
lines(lowess(gw.2, f=1/3), col="red")
101
Anatolia wells 1 and 2
50
Groundwater depth (m)
40
30
20
10
Time
gw gw.2
Jan 1975 34.36 13.87
Feb 1975 34.45 13.79
...
Nov 2004 55.55 15.67
Dec 2004 54.83 15.93
summary(gw2)
## gw gw.2
## Min. :29.90 Min. : 5.34
## 1st Qu.:34.90 1st Qu.:11.55
## Median :41.63 Median :13.74
## Mean :41.58 Mean :13.85
## 3rd Qu.:46.80 3rd Qu.:16.19
## Max. :57.68 Max. :22.47
Another way to see the two series is each on their own panel. This has
the effect of stretching or compressing the response (here, groundwater
level) to the same scale:
plot(gw2, plot.type="multiple", main="Anatolia, two wells")
102
Anatolia, two wells
50
gw
40
30
20
15
gw.2
10
5
Time
The obvious question is how well are the two series correlated? Function
ccf computes the cross-correlation of two univariate series at a series
of lags. Note that the highest correlation between two series might not
be at lag 0 (same time), one series may lag ahead or behind the other (for
example, stream flow at different distances from the source).
By convention the first series named is moved ahead of the second when
computing; so the cross-correlation is between xt+k of the first series
and yt of the second. So a positive lag has the first series ahead of the
second, a negative lag the second is ahead of the first.
##
## Autocorrelations of series 'X', by lag
##
## -1.8333 -1.7500 -1.6667 -1.5833 -1.5000 -1.4167 -1.3333 -1.2500
## 0.681 0.693 0.703 0.709 0.710 0.712 0.716 0.724
## -1.1667 -1.0833 -1.0000 -0.9167 -0.8333 -0.7500 -0.6667 -0.5833
## 0.733 0.745 0.759 0.776 0.790 0.802 0.811 0.815
## -0.5000 -0.4167 -0.3333 -0.2500 -0.1667 -0.0833 0.0000 0.0833
## 0.819 0.824 0.827 0.831 0.841 0.853 0.864 0.874
## 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500
103
## 0.879 0.885 0.890 0.888 0.881 0.873 0.863 0.855
## 0.8333 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167
## 0.852 0.856 0.860 0.863 0.863 0.860 0.853 0.839
## 1.5000 1.5833 1.6667 1.7500 1.8333
## 0.823 0.809 0.786 0.771 0.762
gw & gw.2
0.8
0.6
ACF
0.4
0.2
0.0
−1 0 1
Lag
0.4
0.2
0.0
−1 0 1
Lag
Q66 : What is the correlation at zero lag, i.e. the same month and year?
What is the correlation at one month positive and negative lag? Jump
to A66 •
104
Q67 : Why is this graph not symmetric about zero-lag? Jump to A67 •
## List of 6
## $ acf : num [1:45, 1, 1] 0.681 0.693 0.703 0.709 0.71 ...
## $ type : chr "correlation"
## $ n.used: int 360
## $ lag : num [1:45, 1, 1] -1.83 -1.75 -1.67 -1.58 -1.5 ...
## $ series: chr "X"
## $ snames: chr "gw & gw.2"
## - attr(*, "class")= chr "acf"
## [1] 0.8896892
(i <- which.max(abs(cc$acf)))
## [1] 27
cc$acf[i]
## [1] 0.8896892
cc$acf[i]^2
## [1] 0.7915468
## [1] 0.3333333
## [1] 4
Q68 : What is the highest correlation and its lag? Which series leads?
Jump to A68 •
Q69 : Looking at the graphs and the numeric results, how closely corre-
lated are the two series? Jump to A69
•
105
the spectrum function, as in 3.6. Since there are two series, we can also
compute their coherency, i.e., how much they agree at each frequency;
this is essentially their correlation. Further, it is possible that the series
are similar but lagged. An example is monthly rainfall at two stations
where a monsoon or other seasonal frontal system reaches one station
later than the other. The coherency and phase plots are specified with
the plot.type optional argument to spectrum.
par(mfrow=c(1,3))
spectrum(gw2, spans=c(5,7), lty=1, col=c("black","red"), plot.type="marginal")
spectrum(gw2, spans=c(5,7), lty=1, col=c("black","red"), plot.type="coherency")
spectrum(gw2, spans=c(5,7), lty=1, col=c("black","red"), plot.type="phase")
par(mfrow=c(1,1))
Series: x
Series: x −− Squared Coherency Series: x −− Phase spectrum
Smoothed Periodogram
1.0
5e+00
3
2
0.8
5e−01
1
squared coherency
0.6
spectrum
phase
0
0.4
−1
5e−02
0.2
−2
−3
5e−03
0.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
In the first plot, we can see that the two periodigrams are quite similar,
although the second series has somewhat less power at one and two
years. There seems to be a slight lag of the second series behind the
first, this is most obvious at frequencies 0.5 (two years) and 2 (half-year).
In the second plot, we see the coherency between them at each period.
This confirms the impression from the first plot that the same climate
forcing applies to both. The lack of coherency near frequencies 0.5 and
2 is also shown here.
The third plot shows the a phase differences. There are clear phase
differences at frequencies 0.5, 1, and 2 (series 2 lags) and 1.2–1.8 (series
2 leads). The large swings in phase (±) at 3.4 and 5 cycles must be
artefacts of the low power at these frequencies.
6.1 Answers
A65 : The second well is much closer to the surface than the first. Both have
annual cycles but the trend towards deeper levels is much more pronounced
106
in the first well. The second well appears to have more rapid fluctuations. The
timing of rapid extractions and recharges is different for the two wells. Return
to Q65 •
A66 : At lag 0 the correlation is 0.864; with the first well ahead by one month
it is 0.874; for the first well behind by one month 0.853 Return to Q66 •
A67 : Shifting one series ahead of the other (i.e., lagging it positively) is not the
same as shifting it behind. Think for example of correlating January rainfall
of one station with March in another (first station lagged +2) or correlating it
with November (lag -2); there is no reason to think these will have the same
correlation. Return to Q67 •
A68 : The highest correlation is 0.89 at lag 4; the first well’s records are moved
forward to match the second well’s records. Return to Q68 •
A69 : Although the two wells are in the same region with similar climate and
land use, the highest correlation is not even 0.9; the coefficient of variation
(proportion of variation explained, R 2 ) is only 0.792. Return to Q69 •
7 Gap filling
Many uses of time series require complete series, without any gaps. In
practice many series contain such gaps because of mechanical failure or
observer error; an example is the daily rainfall records from Lake Tana
(§2.2).
Gap filling is reviewed by Salas [15, §19.4].
We first list several approaches to gap filling and their areas of applica-
tion. Several of these are then illustrated with examples.
107
Estimatation from other cycles For a cyclic series (e.g. annual) a miss-
ing value may be estimated by some function (typically the mean) of
values at the same position in the cycle. If there is a trend the estimation
of and from deviation from trend.
For example, missing June rainfall for one year in a 30-year series could
be estimated as the mean of the other 29 June rainfalls. This assumes
the year is not overall wetter or drier, which may not be a reasonable
assumption.
Estimation from other series If the series with missing values is well-
correlated to one or more other series (e.g. one gauging station among a
network, or one weather station in a group), a (multivariate) regression
equation can be developed for the time series as a function of the other
series.
Task 75 : Remove five months at random from the first Anatolia well.
•
Recall, this is in time series gw:
str(gw)
## Time-Series [1:360] from 1975 to 2005: 34.4 34.5 34.7 34.8 34.9 ...
First, set up the series with simulated gaps, initially the same as the
known series:
gwg <- gw
Second, pick five separate positions to delete, using the sample function.
So your results match these, we use set.seed to set the random number
generator to a known starting position.
set.seed(0044)
(ix <- sample(length(gw), size=5))
108
## [1] 268 102 182 206 12
sort(ix)
Third, delete the values at these positions, replacing them with the miss-
ing value constant NA:
summary(gwg)
gwg[ix] <- NA
summary(gwg)
Task 76 : Plot the simulated series, showing the dates with missing
observations. •
plot(gwg, main="Groundwater depth, with gaps",
ylab="Groundwater depth (m)",
sub="Dates with missing values shown as red bars")
abline(h = min(gw), col = "gray")
abline(v=time(gw)[ix], , col = "red", lwd=1)
grid()
50
45
40
35
30
Time
Dates with missing values shown as red bars
Task 77 : Try to decompose the time series with gaps into a trend,
seasonal component, and residuals. •
We use the best decomposition from §3.3, i.e. with a smoother trend and
two-year window on the seasonal amplitude. I know that this will pro-
duce an error; to avoid a fatal error I thus enclose the expression in a call
109
to the try function, and show the error message with the geterrmessage
function:
try(gwg.stl <- stl(gwg, s.window=25, t.window=85))
geterrmessage()
The error message tells us that the stl function failed because there
are missing values . . . a fact we know! The usual R approach to missing
values is to specify a na.action, such as na.omit or na.exclude; in
this case neither help:
try(gwg.stl <- stl(gwg, s.window=25, t.window=85, na.action="na.exclude"),
silent=TRUE)
geterrmessage()
## [1] "Error in stl(gwg, s.window = 25, t.window = 85, na.action = \"na.exclude\") : \n ser
Task 78 : Predict the ground water depths at the dates with missing
values, using linear interpolation. •
Recall (§7.1) that we know the positions in the original time-series object
gw for which we deleted the values to make series gwg:
print(ix)
gw[ix]
gwg[ix]
## [1] NA NA NA NA NA
The dates of these are found with the time function on the original se-
ries, and the [] indexing operator, using the positions of the missing
observations as indices:
time(gw)[ix]
110
## [1] 1997.250 1983.417 1990.083 1992.083 1975.917
## List of 2
## $ x: num [1:5] 1997 1983 1990 1992 1976
## $ y: num [1:5] 43.5 34.4 39.7 41.6 35.3
print(gw.fill.linear)
## $x
## [1] 1997.250 1983.417 1990.083 1992.083 1975.917
##
## $y
## [1] 43.525 34.385 39.665 41.620 35.350
gw.fill.linear$y
gw[ix]
summary((gw[ix] - gw.fill.linear$y))
Task 79 : Plot the reconstructed points on the time series with missing
values, along with the original series. •
plot(gwg, main="Gap-filled time series",
sub="reconstructed values: red; true values: green",
ylab="Groundwater depth (m)")
points(gw.fill.linear$x, gw.fill.linear$y, col="red", cex=2)
points(gw.fill.linear$x, gw.fill.linear$y, col="red", pch=20)
points(time(gw)[ix], gw[ix], col="darkgreen", pch=20)
111
Gap−filled time series
55
Groundwater depth (m)
50
45
●
●
●
40 ●
●
●
●●
●
●
35
●
●
●
30
Time
reconstructed values: red; true values: green
Q71 : How well did the linear interpolation fill the gaps? Jump to A71
•
Task 80 : Predict the ground water depths at the dates with missing
values, using spline interpolation. •
We now call the aspline function, specifying series to be interpolated as
the x and y values, and the missing times as the points to be interpolated
(argument xout).
gw[ix]
112
Compare these to each other and the linear interpolator:
summary(gw.fill.aspline$y - gw.fill.spline$y)
summary(gw.fill.aspline$y - gw.fill.linear$y)
summary(gw.fill.spline$y - gw.fill.linear$y)
Task 81 : Plot the reconstructed points on the time series with missing
values, computed by three methods (linear, default spline, Akima spline)
along with the original series: (1) for the whole series; (2) for a six-month
window centred on March 1997. •
plot(gwg, main="Gap-filled time series", type="l",
ylab="Groundwater depth (m)")
points(gw.fill.aspline$x, gw.fill.aspline$y, col="red", cex=2)
points(gw.fill.aspline$x, gw.fill.aspline$y, col="red", pch=20)
points(gw.fill.spline$x, gw.fill.spline$y, col="blue", cex=2)
points(gw.fill.spline$x, gw.fill.spline$y, col="blue", pch=20)
points(gw.fill.linear$x, gw.fill.linear$y, col="brown", cex=2)
points(gw.fill.linear$x, gw.fill.linear$y, col="brown", pch=20)
points(time(gw)[ix], gw[ix], col="darkgreen", cex=2)
points(time(gw)[ix], gw[ix], col="darkgreen", pch=20)
text(2000, 35.5, "linear", col="brown", pos=2)
text(2000, 34, "default spline", col="blue", pos=2)
text(2000, 32.5, "Akima spline", col="red", pos=2)
text(2000, 31, "true value", col="dark green", pos=2)
45
●
●
●
●
●
●
●
●
●
40
●●
●
● linear
35
●
●
●
● default spline
Akima spline
true value
30
Time
113
points(gw.fill.linear$x, gw.fill.linear$y, col="brown", cex=2)
points(gw.fill.linear$x, gw.fill.linear$y, col="brown", pch=20)
points(time(gw)[ix], gw[ix], col="darkgreen", cex=2)
points(time(gw)[ix], gw[ix], col="darkgreen", pch=20)
text(1997.5, 43, "linear", col="brown", pos=2)
text(1997.5, 42.7, "default spline", col="blue", pos=2)
text(1997.5, 42.5, "Akima spline", col="red", pos=2)
text(1997.5, 42.2, "true value", col="dark green", pos=2)
45.0
●
Groundwater depth (m) ●
●
44.0 ●
●
● ●
●
●
●
43.0
●
linear
●
●
default spline
Akima spline
true value
42.0
Time
## [1] 5
## Time-Series [1:360] from 1975 to 2005: 34.4 34.5 34.7 34.8 34.9 ...
sum(is.na(gwg.r))
## [1] 0
The linear and spline interpolators had little problem with single gaps in
a smooth series. What happens with longer gaps?
These are like the short gaps; no systematic problem but the result of
carelessness or occasional problems with observations.
114
Anatolia well. •
Again we use set.seed so your results will be the same. For convenience
in plotting, we also create a sorted version of the missing-observations
vector, using the sort function.
gwg <- gw
set.seed(0044)
(six <- sort(ix <- sample(length(gw), size=length(gw)/5)))
## [1] 1 4 8 12 19 21 23 27 33 37 41 42 51 52 56 63
## [17] 70 79 81 88 93 101 102 109 113 120 127 135 140 142 144 149
## [33] 151 152 154 158 159 161 165 168 173 177 182 183 190 206 208 211
## [49] 218 219 251 258 268 271 274 278 284 285 286 295 305 309 312 313
## [65] 321 322 324 331 334 335 338 359
time(gw)[six]
gwg[ix] <- NA
summary(gwg)
72 missing values
55
Groundwater depth (m)
50
45
40
35
30
Time
Task 84 : Fill these with linear interpolation and Akima splines; com-
pare the results. •
Again we use approx and aspline; for approx we must specify the rule
to be used at the end of the series (since in this case the first observation
was missing); the default rule=1 returns NA; to get a value we specify
115
rule=2:
gw.fill.linear <- approx(x=time(gwg), y=gwg, xout=time(gw)[six], rule=2)
gw.fill.aspline <- aspline(x=time(gwg), y=gwg, xout=time(gw)[six])
summary(gw.fill.linear$y - gw[six])
summary(gw.fill.aspline$y - gw[six])
summary(gw.fill.aspline$y - gw.fill.linear$y)
●
●
●● ● ●
●●
●
●
●●
●
●
●●
●
●
● ●
●● ● ● ●
55
●
● ● ●
●
●
● ●
●
● ● ●
●
●●
●
50
●
●
●
Groundwater depth (m)
●●
●
● ●
● ●
●
●
● ● ● ●
● ●
●
●
● ●
● ● ●
● ●
●
● ●
● ● ●
● ●
●
●
●
● ●
●
45
●● ● ●
●● ●
●
●● ● ● ●
●
● ● ●
●
●
●
●
● ●
●
●
●● ●
● ●●
● ● ● ●●
●
●
● ● ●
●
●
●●
●●
● ● ●
●
● ●
●
●
●
●
●
● ● ●
●
●● ●●●
● ●
●● ● ● ●
40
●
● ●
● ● ●
● ● ●
●
●
●
● ●
●
● ● ●
●
●● ●
●
●● ●
●
35
● ●
●
●● ● ●
●
●
●
●
●
●● ●
●
●
●
●
●
●
linear
●
●
●●
●
●
●
●
●●● ●
● ●
●
●
● ●
● Akima spline
● ●● ●
●
●
●
●
●
●
true value
●●
● ●
● ● ●
●●
●
30
● ● ●
●●
●
Time
116
text(1989, 43, "true value", col="dark green", pos=2)
●
●
●
●
46
●
linear
●
●
Groundwater depth (m)
●
● Akima spline
●
●
●
44
●
●
●
●
●
true value
● ●
● ●
●
42
● ●
●
● ●
● ●
●
●
●
● ●
●
● ●
●
●
●●
●
●
●
●
●
●
● ●
● ●●
●
● ●
●
40
● ●
●
●
● ●
●
● ●
Time
True values in dark green
Q73 : How well did the interpolators fill the longer gaps? Jump to A73
•
These are when observations are not made for some block of time, per-
haps because of a broken instrument or budget problems. The most
interesting case is when an entire cycle is missing.
## [1] 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987
## [14] 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
## [27] 2001 2002 2003 2004
Select a “typical” year, 1997; remove it; to make the display easier to use
just consider the series since 1990, using the window function. We use
the which function to find the array indices for 1997.
gww <- window(gw, start=1990)
gwg <- gww
(six <- sort(which(floor(time(gwg))==1997)))
## [1] 85 86 87 88 89 90 91 92 93 94 95 96
gwg[six] <- NA
summary(gwg)
117
plot(gwg, main="Missing year 1997", ylab="Groundwater depth (m)")
55
Groundwater depth (m)
50
45
40
Time
summary(gw.fill.aspline$y - gww[six])
summary(gw.fill.aspline$y - gw.fill.linear$y)
118
Gap−filled time series
49
48
Groundwater depth (m) ●
●
47
● ●
●
●
46
●
●●
●
●●●●●●
●
●●
●
●●●
● ● ● ●
● ● ● ● ●
●●●
● ●
●●
● ●
●●● ● ●
●
● ●
● ●● ●
45
● ●
● linear
● ●
● ●
● ●
44
Akima spline
●
●
●
●
43
true value
●
●
Time
For irregular series like rainfall or stream levels, where there are rapid
changes in short times, and often periods with zeroes (rainfall) or low
values (baseflow of streams), gap-filling with interpolation will not work.
One simple method is to estimate from correlated time series, that do
have some observations in common, and have observations at the same
times as the missing values.
This is complicated by the fact that correlations over the whole series
may not be consistent. For example, correlations of rainfall records in a
dry season may be very good (everything is low or zero most days) but
poor in a wet season (even if it rains on the same days, the amounts may
be considerably different).
To examine this method, we use two additional rainfall stations in the
Lake Tana basin, following the procedures from §2.2.
Task 87 : Read the CSV files for stations WeteAbay and Dangila into R
objects and examine their structures. •
tana.2<- read.csv("./ds_tsa/Tana_Daily_WeteAbay.csv", skip=1, header=T,
colClasses=c(rep("integer",2), rep("character",12)),
blank.lines.skip=T,na.strings=c("N.A","NA"," "))
str(tana.2)
119
## $ Mar : chr "0" "0" "0" "0" ...
## $ Apr : chr "0" "0" "0" "12.9" ...
## $ May : chr "0" "0" "0" "12.6" ...
## $ Jun : chr "0" "12.6" "10.4" "10.3" ...
## $ Jul : chr "9" "0.3" "8.4" "46.6" ...
## $ Aug : chr "8.4" "4.4" "6.6" "6.4" ...
## $ Sep : chr "14" "0" "10.9" "1.4" ...
## $ Oct : chr "10.1" "27.8" "0" "20.2" ...
## $ Nov : chr "0" "0" "0" "0" ...
## $ Dec : chr "0" "0" "0" "0" ...
sum(is.na(tana.2[,3:14]))
## [1] 0
sum(is.na(tana.3[,3:14]))
## [1] 383
Task 88 : Set the trace values and any measurements below 0.1 to
zero. •
require(car)
for (i in 3:14) {
tana.2[,i] <- recode(tana.2[,i], "c('TR','tr','0.01')='0'")
}
for (i in 3:14) {
tana.3[,i] <- recode(tana.3[,i], "c('TR','tr','0.01')='0'")
}
sum(c(tana.2[,3:14],tana.3[,3:14])=="TR", na.rm=TRUE)
## [1] 0
sum(c(tana.2[,3:14],tana.3[,3:14])=="tr", na.rm=TRUE)
## [1] 0
sum(c(tana.2[,3:14],tana.3[,3:14])=="0.01", na.rm=TRUE)
## [1] 0
sum(c(tana.2[,3:14],tana.3[,3:14])=="0", na.rm=TRUE)
## [1] 0
120
Task 89 : Organize the daily values as one long vector of values, as
required for time series analysis. •
Which years do we have?
sort(unique(tana$YEAR))
## [1] 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993
## [14] 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
sort(unique(tana.2$Year))
sort(unique(tana.3$Year))
## [1] 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
## [14] 2000 2001 2002 2003 2004 2005 2006
tana.2[tana.2$DATE==29,"FEB"]
## NULL
tana.3[tana.3$DATE==29,"FEB"]
## NULL
month.days <- c(0,0,31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
tana.2.ppt <- NULL;
for (yr.first.row in seq(from=1, by=32, length=(2006 - 2000 + 1))) {
for (month.col in 3:14) {
tana.2.ppt <-
c(tana.2.ppt,
tana.2[yr.first.row:(yr.first.row + month.days[month.col]-1),
month.col])
}
};
str(tana.2.ppt)
## chr [1:2555] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" ...
length(tana.2.ppt)/365
## [1] 7
## chr [1:7300] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" ...
length(tana.3.ppt)/365
## [1] 20
121
Task 90 : Convert this to a time series with the appropriate metadata.
•
Again, the ts function is used to convert the series; the frequency ar-
gument specifies a cycle of 365 days and the start argument specifies
the beginning of each series:
tana.2.ppt <- ts(tana.2.ppt, start=2000, frequency=365)
str(tana.2.ppt)
Task 91 : Plot the two time series, and the original station for compar-
ison •
plot(tana.ppt, main="Lake Tana rainfall, Station 1",
ylab="mm", sub="Missing dates with red bars")
abline(h=60, col="gray")
points(xy.coords(x=time(tana.ppt), y=60, recycle=T),
pch=ifelse(is.na(tana.ppt),"l",""), col="red")
grid()
60
l ll l l llllllll l l
20
0
Time
Missing dates with red bars
ll l ll l ll l ll ll l ll l ll
40
mm
20
0
Time
Missing dates with red bars
122
grid()
80
60
llllllll l l ll ll l ll lllllllllllllllllllll l llllllllllllllllllllllll
mm
40
20
0
Time
Missing dates with red bars
Q75 : Do the three stations cover the same time period? Do they have
the same missing dates? Does every date have at least one observation
(from at least one of the stations)? Jump to A75 •
## [1] 2216 2217 2218 2219 2220 2221 2247 2248 2249 2275 2276 2277 2278
## [14] 2279 2280 2306 2307 2308 2309 2310 2336 2337 2338 2339 2340 2341
## [27] 2367 2368 2369 2370 2371 2397 2398 2399 2400 2401 2402 2428 2429
## [40] 2430 2431 2432 2433 2459 2460 2461 2462 2463 2489 2490 2491 2492
## [53] 2493 2494 2520 2521 2522 2523 2524 2550 2551 2552 2553 2554 2555
The intersect set operator gives the sets where two time series share
the same missing dates.
length(miss.12 <- intersect(time.miss.1, time.miss.2))
## [1] 0
## [1] 96
123
## [1] 64
Task 92 : Find the common period of the three time series, and plot
them on one graph. •
This is the same procedures as in §6, i.e., using the ts.intersect func-
tion to create a “multiple time series” object of class mts:
t3 <- ts.intersect(tana.ppt, tana.2.ppt, tana.3.ppt)
str(t3)
class(t3)
Time
7.5 Answers
A70 : After deletion there are five NA (missing) values. The median and mean
change slightly because of the deleted values. Return to Q70 •
A71 : Fairly well, especially when the missing value was in a linear section of
the graph (steady increase or decrease in depth). However, for March 1997 it
missed the high-water stand substantially; this is because that point happens
to be a local extremum and not in a linear section of the curve. Return to Q71
124
•
A72 : The three interpolators are very similar for the gaps in linear sections
of the graph; however for the extremum of March 1997 there are substantial
differences; both spline interpolators are much closer (about 30 cm) to the true
value. Return to Q72 •
A73 : For gaps in the linear sections of the graph, both interpolators per-
formed reasonably well. However, with longer gaps around extrema, as in
early 1988, both underestimate the extremes. Akima splines performed better
than linear interpolation. Return to Q73 •
A75 : The stations all end in December 2006, but they start in 1981 (Bahir
Dar), 1987 (Dangila), and 2000 (Wete Abay). The first and second stations have
no missing dates in common, whereas the third station has some of the same
missing dates as the other two stations. Return to Q75 •
8 Simulation
8.1 AR models
125
Task 93 : Simulate three realizations of fifteen years of groundwater
level remainders with the AR(1) model from §4.4.1, and compare with
the actual series. •
Recall that the fitted AR(1) model is in object ar.gw.r.1; the coefficients
are in field ar of that object, and the red-noise (residual) variance in field
var.pred:
par(mfrow=c(4,1))
for (i in 1:3) {
plot(arima.sim(model=list(ar=ar.gw.r.1$ar), n=12*15,
rand.gen=rnorm, sd=sqrt(ar.gw.r.1$var.pred)),
main=paste("Simulated AR(1) process",i), ylab="modelled")
abline(h=0, lty=2)
}
plot(window(gw.r, 1989, 1989+15),
main="Remainders 1989 -- 2004", ylab="actual")
abline(h=0, lty=2)
par(mfrow=c(1,1))
126
Simulated AR(1) process 1
3
2
1
modelled
0
−1
−2
−3
0 50 100 150
Time
0
−1
−2
0 50 100 150
Time
0
−1
−2
0 50 100 150
Time
0
−1
−2
Time
Q76 : How well do the simulations reproduce the structure of the actual
series? Jump to A76 •
127
Task 94 : Repeat for the fitted AR(2) model. •
The fitted AR(2) model is in object ar.gw.r.
par(mfrow=c(4,1))
for (i in 1:3) {
plot(arima.sim(model=list(ar=ar.gw.r$ar), n=12*15,
rand.gen=rnorm, sd=sqrt(ar.gw.r$var.pred)),
main=paste("Simulated AR(2) process",i), ylab="modelled")
abline(h=0, lty=2)
}
plot(window(gw.r, 1989, 1989+15),
main="Remainders 1989 -- 2004", ylab="actual")
abline(h=0, lty=2)
par(mfrow=c(1,1))
128
Simulated AR(2) process 1
2
1
0
modelled
−1
−2
−3
0 50 100 150
Time
0
−1
−2
−3
0 50 100 150
Time
0
−1
−2
0 50 100 150
Time
0
−1
−2
Time
Q77 : How well do the AR(2) simulations reproduce the structure of the
actual series? Jump to A77 •
129
8.2 Answers
A76 : The AR(1) simulations seem somewhat noisier than that actual series.
The scale seems correct. Return to Q76 •
A77 : The AR(2) simulations seem to match the actual series better. Return
to Q77 •
130
References
[1] Hiroshi Akima. A new method of interpolation and smooth curve
fitting based on local procedures. Journal of the ACM, 17(4):589–
602, 1970. 112
[2] D Bates. Fitting linear mixed models in R. R News, 5(1):27–30, 2005.
63
[3] David Birkes and Yadolah Dodge. Alternative methods of regression.
John Wiley & Sons, Inc., New York, 1993. 72
[4] George E. P. Box. Time series analysis: forecasting and control.
John Wiley & Sons, Inc., fifth edition / edition, 2016. ISBN 978-
1-118-67492-5. URL http://ebookcentral.proquest.com/lib/
cornell/detail.action?docID=2064681. 1
[5] George E.P. Box, Gwilym M. Jenkins, and Gregory C. Reinsel. Time
series analysis: forecasting, and control. Prentice-Hall, Englewood
Cliffs, NJ, 3rd edition, 1994. 76
[6] William S. Cleveland and Susan J. Devlin. Locally weighted regres-
sion: an approach to regression analysis by local fitting. Jour-
nal of the American Statistical Association, 83(403):596–610, Sep
1988. ISSN 0162-1459, 1537-274X. doi: 10.1080/01621459.1988.
10478639. 32
[7] JC Davis. Statistics and data analysis in geology. John Wiley & Sons,
New York, 3rd edition, 2002. 1
[8] Peter J. Diggle. Time series: a biostatistical introduction. Oxford
Statistical Science Series; 5. Oxford Science Publications, Oxford,
1989. 1
[9] John Fox. An R and S-Plus Companion to Applied Regression. Sage
Publications, Thousand Oaks, CA, USA, 2002. 15
[10] Keith W. Hipel and A. Ian McLeod. Time series modelling of water
resources and environmental systems. Number 45 in Developments
in Water Science. Elsevier, 1994. ISBN 9780444892706. URL http:
//www.stats.uwo.ca/faculty/aim/1994Book/. 1, 72, 98
[11] Robert M. Hirsch, James M Slack, and Richard A Smith. Techniques
of trend analysis for monthly water quality data. Water Resources
Research, 18(1):107–121, 1982. 72, 73
[12] R Ihaka and R Gentleman. R: A language for data analysis and graph-
ics. Journal of Computational and Graphical Statistics, 5(3):299–314,
1996. 1
[13] Andrew V. Metcalfe and Paul S.P. Cowpertwait. Introductory Time
Series with R. Use R! Springer, 2009. DOI: 10.1007/978-0-387-88698-
5. 1
[14] R Development Core Team. R: A language and environment for sta-
tistical computing. R Foundation for Statistical Computing, Vienna,
131
Austria, 2004. URL http://www.R-project.org. ISBN 3-900051-
07-0. 1
[15] Jose D. Salas. Analysis and modeling of hydrologic time series.
In David R. Maidment, editor, Handbook of hydrology, pages 19.1–
19.72. McGraw-Hill, New York, 1993. 1, 107
[16] Robert H. Shumway and David S. Stoffer. Time series analysis and
its applications : with R examples. Springer, New York, 2nd edition,
2006. 1
[17] R Development Core Team. R Data Import/Export. The R Founda-
tion for Statistical Computing, Vienna, version 2.9.0 (2009-04-17)
edition, 2009. URL http://cran.r-project.org/doc/manuals/
R-data.pdf. 12
[18] WN Venables and BD Ripley. Modern applied statistics with S.
Springer-Verlag, New York, fourth edition, 2002. 1
[19] D S Wilks and R L Wilby. The weather generation game: a review of
stochastic weather models. Progress in Physical Geography, 23(3):
329–357, 1999. 125
[20] Daniel S. Wilks. Statistical methods in the atmospheric sciences. In-
ternational Geophysics Series 59. Academic Press, New York, 1995.
1, 79, 82
132
Index of R Concepts
[[]] operator, 25 IND argument (by function), 25
[] operator, 10, 78, 110 index.return argument (sort function),
$ operator, 44 51
~ operator, 25 intersect, 123
is.na, 14, 123
abline, 99
acf, 42, 45, 47, 63, 79, 89, 90 Kendall package, 72, 74, 98
aggregate, 26, 28
akima package, 112 lag, 41, 78
anova, 65 lag.plot, 40, 77
approx, 110, 115 lags argument (lag.plot function), 40
ar, 81, 82, 84 lm, 60, 63, 78
arima, 83, 84, 92 lowess, 32, 34
arima.sim, 125
MannKendall (package:Kendall), 72
as.numeric, 18, 24
match, 27
as.vector, 99
max, 18, 25, 28, 105
aspline (package:akima), 111, 112, 115
mean, 28
attributes, 3
median, 25, 28
boxplot, 25 min, 25, 28
by, 25 mts class, 55, 101, 124
c, 6, 16 NA constant, 109
car package, 15 na.action, 110
ccf, 103, 105 na.contiguous, 20
corAR1 (nlme package), 63 na.exclude, 110
correlation argument (gls function), 63 na.omit, 73, 99, 110
corStruct class, 63 nfrequency argument (aggregate func-
cycle, 4, 10, 24 tion), 28
nlme package, 63
deltat, 5
diff, 7, 8 pacf, 46, 64, 90
plot, 5, 17, 18, 36, 71
end, 3 plot.stl, 36
end argument (window function), 18 plot.ts, 5, 101
excel_sheets (readxl package), 12 plot.type argument (spectrum function),
extend argument (window function), 18 106
points, 17
file.show, 2, 12 predict, 68, 94
filter, 29 predict.Arima, 94
fitted, 85 predict.lm, 68
floor, 24, 117 print, 4
frequency, 5, 34, 86 print.ts, 4
FUN argument (aggregate function), 28
FUN argument (by function), 25 quantile, 25
133
readxl package, 12
recode, 15
require, 15
rule argument (approx function), 115, 116
unique, 14
v graphics argument, 99
134